frequent itemset generation
This step aims to generate all possible with a $sup \gt threshold$.
brute-force approach
For each possible item-set (called candidate) compute it’s $sup$ by scanning the database.
This, approach which have a complexity of $\mathcal{o}(NMW)$ where:
- $N$ -> total number of the transactions
- $W$ -> average number of items within a transaction
- $M$ -> number of frequent item-set candidate.
it’s extremely computational expensive.
There are other strategies that aims to reduce the computational cost of this operation such as:
- reducing the number of candidates by pruning (apriori algorithm)
- reducing the number of comparisons $NM$
brute-force approach
The brute-force approach generates each item-set in the graph above. Then, it computes the sup and conf indexes values for every association rule generated by every item-set.
- Complexity (EXPENSIVE): O(NWM), where
frequent item-set generation strategies
- Reduce the number of candidates M (Apriori Algorithm)
- Complete Search: $M=2^D$
- Use pruning techniques to reduce M
- Reduce the number of comparisons NM
- Use efficient data structures to store the candidates or transactions
- No need to match every candidate against every transaction