rules generation
The goal of this phase is, given an item-set $L$ to find all non empty $f$ subsets that the association rule $f \rightarrow (L-f)$ have a $conf \gt threshold$ where $threshold$ is a parameter defined from the designer
It’important to understand how to generate rules with an high $conf$ from a given item-set
so from this formula we can say that for rules generated from the same item-sets:
$$ conf(ABC \rightarrow D) \geq conf(AB \rightarrow CD) \geq conf(A \rightarrow BCD) $$
this means that rules confidence is anti-monotone in relation of the number of elements that creates the antecedent
this characteristic can be used to prune the generation tree when there is a rule with a $conf \lt threshold$
interesting rule generation metrics
Confidence metrics can be insufficient in the rule generation process, other interesting metrics are also used
so given a rule in the form of $A \rightarrow C$ we can compute the contingency table as follow
| C | $\overline{C}$ | ||
|---|---|---|---|
| A | $f_{11}$ | $f_{10}$ | $f_{1+}$ |
| $\overline{A}$ | $f_{01}$ | $f_{00}$ | $f_{0+}$ |
| $f_{+1}$ | $f_{+0}$ |
from this table we can derive some interesting measures like
lift
lift is defined as
$$ lift(A \rightarrow C) = \frac{conf(A \rightarrow C)}{sup(C)} $$
it indicates the ratio between rule true cases and independence, it tend to $1$ when the item-sets are independent
leverage
leverage is defined as
$$ leve(A \rightarrow C) = sup(A \cup C) - sup(A)sup(C) $$
leverage indicates the number of additional cases, it tends to $0$ when the item-sets are independent
conviction
conviction is defined as
$$ conv(A \rightarrow C) = \frac{1 - sup(C)}{1 - conf(A \rightarrow C)} $$
it’s the ratio that $A$ occurs without $C$ if $A$ and $C$ where independent, higher value means that the rules is violated less often (in the assumption that the $A$ and $C$ are independent)