【问题标题】:Grouping similar values in a dictionary在字典中对相似值进行分组
【发布时间】:2018-03-15 09:18:05
【问题描述】:

我是编程新手,如果有人可以在 Python/Pandas 中提供以下帮助,我将不胜感激。 我有一本字典,其中包含一个列表作为值。我希望能够将具有相似值的键组合在一起。我在这里看到过类似的问题,但在这种情况下,我想忽略值的顺序,例如:

classmates={'jack':['20','male','soccer'],'brian':['26','male','tennis'],'charles':['male','soccer','20'],'zulu':['19','basketball','male']}

jack 和 charles 的值相同,但顺序不同。我想要一个无论顺序如何都能给出价值的输出。在这种情况下,输出将被写入 csv 为

['20','male','soccer']: jack, charles
['26','male','tennis']: brian
['19','basketball','male']: zulu

【问题讨论】:

    标签: python pandas dictionary group-by


    【解决方案1】:

    您可以使用以下代码以您想要的方式反转字典:

    classmates={'jack':['20','male','soccer'],'brian':['26','male','tennis'],'charles':['male','soccer','20'],'zulu':['19','basketball','male']}
    
    out_dict = {}
    for key, value in classmates.items():
        current_list = out_dict.get(tuple(sorted(value)), [])
        current_list.append(key)
        out_dict[tuple(sorted(value))] = current_list
    
    print(out_dict)
    

    打印出来

    {('20', 'male', 'soccer'): ['charles', 'jack'], ('26', 'male', 'tennis'): ['brian'], ('19', 'basketball', 'male'): ['zulu']}
    

    【讨论】:

      【解决方案2】:

      使用frozensetsapplygroupby + agg

      s = pd.DataFrame(classmates).T.apply(frozenset, 1)
      
      s2 = pd.Series(s.index.values, index=s)\
                .groupby(level=0).agg(lambda x: list(x))
      
      s2
      (soccer, 20, male)        [charles, jack]
      (26, male, tennis)                [brian]
      (basketball, male, 19)             [zulu]
      dtype: object
      

      【讨论】:

      • agg 需要lambda x: list(x) 吗?不就是agg(list)吗?
      • @AdamSmith 是的,否则你会得到TypeError: 'type' object is not iterable
      • 谢谢——我的 pandas-fu 是最弱的!
      【解决方案3】:
      from collections import defaultdict
      
      ans = defaultdict(list)
      
      classmates={'jack':['20','male','soccer'],
                  'brian':['26','male','tennis'],
                  'charles':['male','soccer','20'],
                  'zulu':['19','basketball','male']
                 }
      
      
      for k, v in classmates.items():
          sorted_tuple = tuple(sorted(v))
          ans[sorted_tuple].append(k)
      
      # ans is: a dict you desired
      # defaultdict(<class 'list'>, {('20', 'male', 'soccer'): ['jack','charles'],
      # ('26', 'male', 'tennis'): ['brian'], ('19', 'basketball', 'male'): ['zulu']})
      
      for k, v in ans.items():
          print(k, ':', v)
      
      # output: 
      # ('20', 'male', 'soccer') : ['jack', 'charles']
      # ('26', 'male', 'tennis') : ['brian']
      # ('19', 'basketball', 'male') : ['zulu']
      

      【讨论】:

        【解决方案4】:

        首先将您的字典转换为 pandas 数据框。

        df= pd.DataFrame.from_dict(classmates,orient='index')
        

        然后按年龄升序排序。

        df=df.sort_values(by=0,ascending=True)
        

        这里的 0 是默认的列名。您可以重命名此列名称。

        【讨论】:

          【解决方案5】:

          你可以在一行中做到这一点:

          print({tuple(sorted(v)) : [k for k,vv in a.items() if sorted(vv) == sorted(v)] for v in a.values()})
          

          这里是详细的解决方案:

          dict_1 = {'jack': ['20', 'male', 'soccer'], 'brian': ['26', 'male', 'tennis'], 'charles': ['male', 'soccer', '20'],
               'zulu': ['19', 'basketball', 'male']}
          
          sorted_dict = {}
          for key,value in dict_1.items():
              sorted_1 = sorted(value)
              sorted_dict[key] = sorted_1
          
          tracking_of_duplicate = []
          final_dict = {}
          for key1,value1 in sorted_dict.items():
              if value1 not in tracking_of_duplicate:
                  tracking_of_duplicate.append(value1)
                  final_dict[tuple(value1)] = [key1]
          
              else:
          
                  final_dict[tuple(value1)].append(key1)
          
          print(final_dict)
          

          【讨论】:

            猜你喜欢
            • 2011-01-14
            • 2012-04-12
            • 1970-01-01
            • 2023-01-04
            • 2012-01-27
            • 2013-08-13
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            相关资源
            最近更新 更多