【问题标题】:multi-index pandas dataframe to a dictionary多索引熊猫数据框到字典
【发布时间】:2017-12-03 01:17:20
【问题描述】:

我有一个如下的数据框:

raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'],
    'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'],
    'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'],
    'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
    'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}

df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'preTestScore', 'postTestScore'])

如果我按两列分组并计算大小,

df.groupby(['regiment','company']).size()

我得到以下信息:

regiment    company
Dragoons    1st        2
            2nd        2
Nighthawks  1st        2
            2nd        2
Scouts      1st        2
            2nd        2
dtype: int64

我想要的输出是一个字典,如下所示:

{'Dragoons':{'1st':2,'2nd':2},
 'Nighthawks': {'1st':2,'2nd':2}, 
  ... }

我尝试了不同的方法,但无济于事。有没有相对干净的方法来实现上述目标?

非常感谢您!!!!

【问题讨论】:

    标签: python pandas dictionary multi-index


    【解决方案1】:

    您可以添加Series.unstackDataFrame.to_dict

    d = df.groupby(['regiment','company']).size().unstack().to_dict(orient='index')
    print (d)
    {'Dragoons': {'2nd': 2, '1st': 2}, 
     'Nighthawks': {'2nd': 2, '1st': 2}, 
     'Scouts': {'2nd': 2, '1st': 2}}
    

    另一个解决方案,与另一个答案非常相似:

    from collections import Counter
    
    df = {i: dict(Counter(x['company'])) for i, x in df.groupby('regiment')}
    print (df)
    {'Dragoons': {'2nd': 2, '1st': 2}, 
    'Nighthawks': {'2nd': 2, '1st': 2}, 
    'Scouts': {'2nd': 2, '1st': 2}}
    

    但是如果使用第一个解决方案,NaNs 会有问题(这取决于数据)

    示例:

    raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'],
        'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '3rd'],
        'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'],
        'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
        'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}
    
    df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'preTestScore', 'postTestScore'])
    print (df)
          regiment company      name  preTestScore  postTestScore
    0   Nighthawks     1st    Miller             4             25
    1   Nighthawks     1st  Jacobson            24             94
    2   Nighthawks     2nd       Ali            31             57
    3   Nighthawks     2nd    Milner             2             62
    4     Dragoons     1st     Cooze             3             70
    5     Dragoons     1st     Jacon             4             25
    6     Dragoons     2nd    Ryaner            24             94
    7     Dragoons     2nd      Sone            31             57
    8       Scouts     1st     Sloan             2             62
    9       Scouts     1st     Piger             3             70
    10      Scouts     2nd     Riani             2             62
    11      Scouts     3rd       Ali             3             70
    

    df1 = df.groupby(['regiment','company']).size().unstack()
    print (df1)
    company     1st  2nd  3rd
    regiment                 
    Dragoons    2.0  2.0  NaN
    Nighthawks  2.0  2.0  NaN
    Scouts      2.0  1.0  1.0
    
    d = df1.to_dict(orient='index')
    print (d)
    {'Dragoons': {'3rd': nan, '2nd': 2.0, '1st': 2.0}, 
    'Nighthawks': {'3rd': nan, '2nd': 2.0, '1st': 2.0}, 
    'Scouts': {'3rd': 1.0, '2nd': 1.0, '1st': 2.0}}
    

    那么就要用到了:

    d = {i: dict(Counter(x['company'])) for i, x in df.groupby('regiment')}
    print (d)
    {'Dragoons': {'2nd': 2, '1st': 2}, 
    'Nighthawks': {'2nd': 2, '1st': 2},
     'Scouts': {'3rd': 1, '2nd': 1, '1st': 2}}
    

    或另一个John Galt 答案。

    【讨论】:

    • 我在第一个答案中发现问题 - 仅适用于所有类别(如您的示例数据中)。所以更一般的是第二个答案或其他解决方案......
    • 我明白了。我最终采用了第二种解决方案,因为它不会产生带有 nans 的密钥。
    【解决方案2】:

    您可以在分组后重置索引并根据需要旋转数据。下面的代码给出了所需的输出。

    df = df.groupby(['regiment','company']).size().reset_index()
    print(pd.pivot_table(df, values=0, index='regiment', columns='company').to_dict(orient='index'))
    

    输出:

    {'Nighthawks': {'2nd': 2, '1st': 2}, 'Scouts': {'2nd': 2, '1st': 2}, 'Dragoons': {'2nd': 2, '1st': 2}}
    

    【讨论】:

      【解决方案3】:

      如何创建具有组理解的字典。

      In [409]: {g:v['company'].value_counts().to_dict() for g, v in df.groupby('regiment')}
      Out[409]:
      {'Dragoons': {'1st': 2, '2nd': 2},
       'Nighthawks': {'1st': 2, '2nd': 2},
       'Scouts': {'1st': 2, '2nd': 2}}
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2021-10-26
        • 2021-06-21
        • 2021-02-04
        • 2017-10-04
        • 1970-01-01
        • 2016-10-17
        • 2017-08-02
        相关资源
        最近更新 更多