similarity and dissimilarity

similarity

numerical measure of how alike two data objects are, higher when objects are more alike ( range of $[0-1]$ )

dissimilarity

numerical measure of how different two data objects are, lower when objects are more alike ( minimum value $0$ upper bound varies )

ATTRIBUTE TYPE DISSIMILARITY SIMILARITY
NOMINAL $0$ if $p=q$ and viceversa $1$ if $p=q$ and viceversa
ORDINAL $$\frac{|p-q|}{V-1}$$ $$1 - \frac{|p-q|}{V-1}$$
INTERVAL $$|p-q|$$ $$\frac{1}{1+d}$$

properties of similarity

similarity between vectors

simple matching coefficient

the ratio between the number of matches and the number of attributes

$$ \frac{M_{00} + M_{11}}{M_{00}+M_{10}+M_{01}+M_{11}} $$

jaccard coefficient

the ratio between the number of $11$ matches and the number of non $00$ attributes

$$ \frac{M_{11}}{M_{10}+M_{01}+M_{11}} $$

cosine coefficient

the cosine between the vectors

$$ \frac{pq}{|p||q|} $$

extended jaccard coefficient tanimoto

the jaccard coefficient for continuous attributes

$$ \frac{pq}{|p|^2|q|^2-p*q} $$