【问题标题】:Python apriori returning Generator instead of DataframePython apriori 返回 Generator 而不是 Dataframe
【发布时间】:2021-08-16 11:39:24
【问题描述】:

我正在编写获取数据集(购物篮)的一小部分的代码,将其转换为热编码数据帧,并且我想在其上运行 mlxtend 的先验算法以获取频繁项集。

但是,每当我运行 apriori 算法时,它似乎会立即运行并且它返回一个生成器对象而不是一个数据帧。我按照documentation 的说明进行操作,在他们的示例中,它显示 apriori 返回了一个数据帧。我做错了什么?

这是我的代码:

import numpy as np
import pandas as pd
import csv
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
from mlxtend.preprocessing import TransactionEncoder
from apyori import apriori

def simpleRandomisedSample(filename, support_frac, sample_frac):
    df1 = pd.read_csv("%s.csv" % filename, header=None) #Saving csv file into a dataframe in memory
    size = len(df1)
    support = support_frac * len(df1) #Sets the original support value to x% of the original dataset
    sample_support = support * sample_frac #Support for our reduced sample as a fraction of the original support
    sample = df1.sample(frac=sample_frac) #Saving x% (randomised) of the dataset as our sample
    sample = sample.reset_index(drop = True) #Reseting indexes (which previously got randomised along with the data)
    del df1 #Deleting original dataframe from memory to clear up space
    sample_size = len(sample)
    return size, support, sample_size, sample_support, sample

def main():
    size, support, sample_size, sample_support, sample = simpleRandomisedSample("chess",0.01,0.1)
    print("The original dataset had %d rows and a support of %.2f" % (size, support))
    print("The dataset was reduced to %d rows and the sample has a support of %.2f" % (sample_size, sample_support)) 

    sample_list = sample.values.tolist() #Converting Dataframe to list of lists for use with Apriori
    te = TransactionEncoder()
    te_ary = te.fit(sample_list).transform(sample_list) #Preprocessing our sample to work with Apriori algorithm
    df = pd.DataFrame(te_ary, columns=te.columns_)
    print(df)
    frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)
    print(frequent_itemsets)
    
if __name__ == "__main__":
    main()

【问题讨论】:

    标签: python data-mining apriori market-basket-analysis


    【解决方案1】:

    您的导入中有名称冲突:

    from mlxtend.frequent_patterns import apriori
    [...]
    from apyori import apriori
    

    您的代码没有使用mlxtend算法,而是apyori提供的算法,延迟导入的算法会覆盖之前的算法。

    您可以删除您不使用的那个,或者,如果您想稍后访问这两个,您可以给一个不同的名称:

    from mlxtend.frequent_patterns import apriori as mlx_apriori
    from apyori import apriori as apy_apriori
    

    【讨论】:

    • 哇,真是个愚蠢的错误哈哈,非常感谢,现在可以了!
    猜你喜欢
    • 2022-12-09
    • 1970-01-01
    • 2018-11-18
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-11-07
    • 2020-06-22
    • 2019-03-11
    相关资源
    最近更新 更多