【问题标题】:sum values in dictionary less than a certain value字典中的总和值小于某个值
【发布时间】:2018-12-08 12:40:37
【问题描述】:

我有以下字典,并试图从中制作一个饼图,但我只想包括前 5 个(它们已在此处按值排序),然后将其他字典加到 Other 类别中,即替换PublishingFashionFood 等等,只有一个 Other 将它们加在一起。坚持如何做到这一点,所以将不胜感激任何帮助!

{'Games': 715067930.8599964,
 'Design': 705237125.089998,
 'Technology': 648570433.7599969,
 'Film & Video': 379559714.56000066,
 'Music': 191227757.8699999,
 'Publishing': 130763828.65999977,
 'Fashion': 125678824.47999984,
 'Food': 122781563.58000016,
 'Art': 89078801.8599998,
 'Comics': 70600202.99999984,
 'Theater': 42662109.69999992,
 'Photography': 37709926.38000007,
 'Crafts': 13953818.35000002,
 'Dance': 12908120.519999994,
 'Journalism': 12197353.370000007}

目前我的饼图使用这段代码真的人满为患

groupbycategorypledge = df.groupby('main_category')['usd_pledged_real'].sum().sort_values(ascending=False)
plt.figure(figsize=(20, 10))
pie = groupbycategorypledge.plot(kind='pie', startangle=90, radius=0.7, title='Amount Pledged by Main category',autopct='%1.1f%%',labeldistance=1.2)
plt.legend(loc=(1.05,0.75))
plt.ylabel('')

所以我有

dict = groupbycategorypledge.sort_values(ascending=False).to_dict()

【问题讨论】:

    标签: python sorting dictionary aggregate


    【解决方案1】:

    您可以在之前使用 Pandas 操作您的字典:

    from operator import itemgetter
    
    # sort by value descending
    items_sorted = sorted(d.items(), key=itemgetter(1), reverse=True)
    
    # calculate sum of others
    others = ('Other', sum(map(itemgetter(1), items_sorted[5:])))
    
    # construct dictionary
    d = dict([*items_sorted[:5], others])
    
    print(d)
    
    {'Games': 715067930.8599964,
     'Design': 705237125.089998,
     'Technology': 648570433.7599969,
     'Film & Video': 379559714.56000066,
     'Music': 191227757.8699999,
     'Other': 658334549.8999995}
    

    【讨论】:

      【解决方案2】:

      基于@jpp 的想法,但使用heap

      import heapq
      
      d = {'Games': 715067930.8599964,
           'Design': 705237125.089998,
           'Technology': 648570433.7599969,
           'Film & Video': 379559714.56000066,
           'Music': 191227757.8699999,
           'Publishing': 130763828.65999977,
           'Fashion': 125678824.47999984,
           'Food': 122781563.58000016,
           'Art': 89078801.8599998,
           'Comics': 70600202.99999984,
           'Theater': 42662109.69999992,
           'Photography': 37709926.38000007,
           'Crafts': 13953818.35000002,
           'Dance': 12908120.519999994,
           'Journalism': 12197353.370000007}
      
      top_5 = set(heapq.nlargest(5, d, key=d.get))
      
      groups = {}
      for category, pledge in d.items():
          new_category = category if category in top_5 else 'Other'
          groups.setdefault(new_category, []).append(pledge)
      
      result = {k: sum(v) for k, v in groups.items()}
      print(result)
      

      输出

      {'Technology': 648570433.7599969, 'Design': 705237125.089998, 'Other': 658334549.8999994, 'Games': 715067930.8599964, 'Film & Video': 379559714.56000066, 'Music': 191227757.8699999}
      

      或者如果你喜欢 numpy:

      import numpy as np
      
      d = {'Games': 715067930.8599964,
           'Design': 705237125.089998,
           'Technology': 648570433.7599969,
           'Film & Video': 379559714.56000066,
           'Music': 191227757.8699999,
           'Publishing': 130763828.65999977,
           'Fashion': 125678824.47999984,
           'Food': 122781563.58000016,
           'Art': 89078801.8599998,
           'Comics': 70600202.99999984,
           'Theater': 42662109.69999992,
           'Photography': 37709926.38000007,
           'Crafts': 13953818.35000002,
           'Dance': 12908120.519999994,
           'Journalism': 12197353.370000007}
      
      categories, pledge_values = map(np.array, zip(*d.items()))
      partition = np.argpartition(pledge_values, -5)
      top_5 = set(categories[partition][-5:])
      
      groups = {}
      for category, pledge in d.items():
          new_category = category if category in top_5 else 'Other'
          groups.setdefault(new_category, []).append(pledge)
      
      result = {k: sum(v) for k, v in groups.items()}
      print(result)
      

      输出

      {'Technology': 648570433.7599969, 'Design': 705237125.089998, 'Other': 658334549.8999995, 'Music': 191227757.8699999, 'Games': 715067930.8599964, 'Film & Video': 379559714.56000066}
      

      第二个提案(使用 numpy)的复杂度是 O(n),其中 nd 的键值对的数量。

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2012-07-26
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多