sampling

the process of reducing the dataset dimensions making samples, the goals are:

a sample of a dataset can be usefull if it is representative

types

sample size

select the sample size is a tradeoff between data reduction and precision, there are techniques to get the optimal sample size and a sample that has meaning

missing classes

the probability of sampling at least an element for each class is independent from the size of the dataset (if using replacement)

this is important when using a small dataset for cross-validation or train test splits cause there could be not enough data for the partition