performance of a classifier

definitions

in a classification problem where class is a binary attribute the follow schema can be produced in order to study data

POS-PRED NEG-PRED
POS-TRUE $TP$ $FP_{a-b}$
NEG-TRUE $FP_{b-a}$ $TN$

where:

these measurements can be used to calculate some interesting performance metrics such as

accuracy of the classifier

$$
\frac{TP + TN}{Ntest}
$$

accuracy gives an inital idea of the performance but can be misleading when classes are unbalanced

f1 score is insteresting because is higher when precision and recall are balanced

if the cost of positive and negative errors are different than precision and recall should be considered

multi class case

in a problem with non binary class attribute the previous table can be extended, it’s called confusion matrix

a b c Total
a $TP_{a}$ $FP_{a-b}$ $FP_{c-a}$ $T_{a}$
b $FP_{b-a}$ $TP_{b}$ $FP_{c-b}$ $T_{b}$
c $FP_{c-a}$ $FP_{b-c}$ $TP_{c}$ $T_{c}$
Total $P_{a}$ $P_{b}$ $P_{c}$ N

these measures can be global:

$$ f(C)= \frac{\sum{f(ci)}}{C} $$

these measures can be weighted:

$$ f(C)= \frac{\sum{f(ci)*Ci}}{C} $$