【问题标题】:Counting number of times dictionary keys appear in a dataframe计算字典键出现在数据框中的次数
【发布时间】:2022-01-10 21:53:26
【问题描述】:

我有一本字典,其中包含项集的键和它们的计数值。我想计算项目集在数据框中出现的次数(作为完全匹配)。数据框有 ~10k 行

第一个项目集的字典(dict_of_items):

{'apple','banana','pear'}: 0, 
{'banana', 'orange', 'squash'}: 0

第二个项目集的数据框(df):

Index | basket
1     | ['apple','banana',pear']
2     | ['banana']
3     | ['banana', 'orange','squash']
4     | ['apple','banana',pear']
...

期望的输出(字典的值是实际计数):

{'apple','banana','pear'}: 2, 
{'banana', 'orange', 'squash'}: 1

我已经尝试过和.iterrows(),但值仍然为0,例如:

for item in dict_of_items:
    if item in df['basket']:
        dict_of_item[item] += 1

【问题讨论】:

    标签: python pandas dictionary


    【解决方案1】:

    已发布解决方案的问题:

    1. 字典不能包含集合作为键,因为集合不可散列(使用 freezeset)
    2. if item in df['basket']: 不起作用,因为篮子包含列表并且项目是一个集合。

    代码

    import pandas as pd
    from collections import Counter
    
    # Initialization
    dict_of_item = {
        frozenset({'apple','banana','pear'}): 0, 
        frozenset({'banana', 'orange', 'squash'}): 0}
    
    data = {'basket': [['apple','banana', 'pear'],
                       ['banana'],
                        ['banana', 'orange','squash'],
                        ['apple','banana', 'pear']]}
                         
    df = pd.DataFrame(data)
    
    # Processing
    # Get count of sets in basket by convert each list to a frozen set and counting each frozen set appears in column basket.
    basket_set_count = Counter(df['basket'].apply(frozenset))
    
    # Find intersection of keys in basket_set_count and dictionary of keys
    # Use the count from basket_set_count as the number of elements
    result = {k:basket_set_count[k] for k in set(basket_set_count.keys()) & set(dict_of_item.keys())}
    
    print(result)
    # Output: {frozenset({'pear', 'banana', 'apple'}): 2, 
               frozenset({'orange', 'squash', 'banana'}): 1}
    

    【讨论】:

      猜你喜欢
      • 2015-09-06
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-11-23
      • 2021-11-25
      • 1970-01-01
      • 2021-08-02
      相关资源
      最近更新 更多