type conversions

procedure to convert data types of attributes, there are many purposes such as:

Transforming in a numerical quantity the features that represent categories.

there are 2 types of encoders

nominal to numeric conversion

one hot encoder

It transforms a feature with $V$ unique values in $V$ boolean feature each, if an object $X$ as the value $d$ for the feature $V$ than the corresponding new feature $d$ is set to $true$

here a usage example of the scikit-learn implementation:

from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder() # creating object
ohe.fit(X) # fit the data
ohe.categories_ # show categories founded
ohe.transform(X) # apply the transformation

ordinal to numeric conversion

ordinal encoder

It transforms a ordinal feature $V$ in a numeric one preserving the order by translate the values into consecutive integers, the user can specify the sequence (lexicographic order is default)

here a usage example of the scikit-learn implementation:

from sklearn.preprocessing import OrdinalEncoder
oe = OrdinalEncoder() # creating object
oe.fit(X) # fit the data
oe.categories_ # show categories founded
oe.transform(X) # apply the transformation

numeric to binary conversion

binarizer with threshold

it transform numeric values to binary ones by using a threshold

discretization

transformation that turns continuous domains into discrete ones, there are many algorithms for the purpose, some are domain knowledge based other use thresholds

numeric to k values conversion

Number are discretized in a series of values from $0$ to $k-1$

Link map