performance of a classifier

Classification

definitions

in a classification problem where class is a binary attribute the follow schema can be produced in order to study data

	POS-PRED	NEG-PRED
POS-TRUE	$TP$	$FP_{a-b}$
NEG-TRUE	$FP_{b-a}$	$TN$

where:

these measurements can be used to calculate some interesting performance metrics such as

accuracy of the classifier

$$
\frac{TP + TN}{Ntest}
$$

ERROR RATE

$$1 - accuracy$$
PRECISION

rate of true positives among positive classifications

$$ \frac{TP}{TP + FP} $$
RECALL

rate of positives that the classifier can catch (sensitivity)

$$ \frac{TP}{TP + FN} $$
SPECIFICITY

rate of negatives that the classifier can catch

$$ \frac{TN}{TN + FP} $$
F1 SCORE

armonic mean of precision and recall

$$ 2*\frac{precision*recall}{precision + recall} $$

accuracy gives an inital idea of the performance but can be misleading when classes are unbalanced

f1 score is insteresting because is higher when precision and recall are balanced

if the cost of positive and negative errors are different than precision and recall should be considered

in a problem with non binary class attribute the previous table can be extended, it’s called confusion matrix

	a	b	c	Total
a	$TP_{a}$	$FP_{a-b}$	$FP_{c-a}$	$T_{a}$
b	$FP_{b-a}$	$TP_{b}$	$FP_{c-b}$	$T_{b}$
c	$FP_{c-a}$	$FP_{b-c}$	$TP_{c}$	$T_{c}$
Total	$P_{a}$	$P_{b}$	$P_{c}$	N

these measures can be global:

$$ f(C)= \frac{\sum{f(ci)}}{C} $$

these measures can be weighted:

$$ f(C)= \frac{\sum{f(ci)*Ci}}{C} $$

previous next