【问题标题】:count operations grouped by country, return dataframe in python按国家分组计数操作,在 python 中返回数据框
【发布时间】:2021-03-01 13:48:44
【问题描述】:

数据:

country operations
India A
Malaysia B
Croatia C
India C
India C
Malaysia D
Malaysia A

期望的输出:

{ "India" :{"A":1,"C":2},"Malaysia":{"B":1,"A":1,"D":1},"Croatia":{"C":1}}

我试过了:


arrays = [countrylist, opslist]

index = pd.MultiIndex.from_arrays(arrays, names=('Country', 'Ops'))

df=pd.DataFrame(index)

count = list(df[0].value_counts())

clist = list(df[0].unique())

csdict = dict()

for country,service in clist: 

csdict.setdefault(country, []).append(service) 

country_list = list(csdict.keys())

service_list = list(csdict.values())

fdict = { "country" : country_list, "services" : service_list}

dataf = pd.DataFrame(fdict)

【问题讨论】:

    标签: python pandas dataframe dictionary


    【解决方案1】:

    对每个组使用带有Series.value_counts 的字典理解:

    d = {k: v.value_counts(sort=False).to_dict() 
             for k, v in df.groupby('country', sort=False)['operations']}
    
    print (d)
    {'India': {'A': 1, 'C': 2}, 'Malaysia': {'B': 1, 'A': 1, 'D': 1}, 'Croatia': {'C': 1}}
    

    【讨论】:

      【解决方案2】:

      以下是如何使用内置的zip() 方法:

      z = list(zip(df.country, df.operations))
      
      output = dict()
      for c, o in z:
          output[c] = output.get(c) or dict()
          output[c][o] = z.count((c, o))
      print(output)
      

      输出:

      {'India': {'A': 1, 'C': 2}, 'Malaysia': {'B': 1, 'D': 1, 'A': 1}, 'Croatia': {'C': 1}}
      

      【讨论】:

        猜你喜欢
        • 2021-10-23
        • 1970-01-01
        • 1970-01-01
        • 2013-05-17
        • 1970-01-01
        • 2019-01-05
        • 2021-10-30
        • 2021-07-30
        • 1970-01-01
        相关资源
        最近更新 更多