data types

Data types Description Examples Descriptive statistics allowed Domain
Categorical nominal set of labels, the available information allows to distinguish a label from another. Operators: =, != zip code, fiscal code mode,entropy,contingency,correlation Discrete
ordinal Operators: <,«,>,» non numerical quality evaluations media percentiles rank correlations Discrete
Numerical interval possible to add or subtract data +,- calendar dates, temperatures in C° and F° average standard deviations Continous
ratio Have a univocal definition of 0 allowing all mathematical operations temperatures in K° geometric mean harmonic mean percentage variation Continous

interval data

interval vs ratio

Interval does not preserve relative values upon scale change

allowed transformations

data type transformation
nominal any one-to-one correspondence
ordinal Any order preserving transformation (any monotonic function)
interval linear functions
ratio any mathematical function, standardization,variation in percentage

asymmetric attributes

general characteristics of data sets

dimensionality

sparsity

beware the nulls in disguise

resolution

The data is organized in records

data quality

detect outliers with descriptive statistics

iqr = interquartile range

IQR = Q3 - Q1`
lower-boundary = Q1 - IQR * 1.5
upper-boundary = Q3 + IQR * 1.5

with Q1 first quartile Q3 third quartile

the outliers are values outside the boundaries

handling missing values

strategy comment
ignoring the values that are missing extreme, not a generic good idea
insert all possible values weighted with probabilities used in probabilistic learning, expensive
estimate the missing values default choise

duplicated data