It’s the very first time that I have learnt about FP-TREE item-set mining algorithm in the DATA MINING course. I was really intrigued by its unique design to reduce cost in traversing transactions in database. In comparison with APRIORI algorithm, which would look up the whole database for k times if there are k items at most in all transactions, fp-tree interact with date only twice, one for constructing frequency record for each item and the other for building up the tree.
Initially, encouraged by our teacher to consider how to improve naïve algorithms in mining association rules. I thought tree might work but my design was more of dictionary tree, which was also naïve because it is often used in searching strings with consideration of order. However, I would like to record some inspirations fp-tree gave me today.
According to a basic rule applied in apriori, any superset of an infrequent subset must be infrequent. Rearranging the single items in descending frequency, eliminating disstatisfying ones and traversing the tree in a top-down manner makes it possible that we can get promising items for the future selection. Intuively, the more frequently a single item appear in transactions, the more peers it is connected to. Thus, if we put such item in a higher position, it will have more descendents. Such implement could give us some hope in obtaining frequent items by combining other single ones consecutively. After that, unnecessary traversals could be avoided.
More details to be added in the coming days!
相关文章: