【问题标题】:Create a dictionary by grouping by values from a dataframe column in python通过按 python 中数据框列中的值分组来创建字典
【发布时间】:2018-01-22 15:20:02
【问题描述】:

我有一个 7 列的数据框,如下:

  Bank_Acct Firstname | Bank_Acct Lastname | Bank_AcctNumber   | Firstname | Lastname | ID | Date1    | Date2
    B1                  | Last1              | 123               | ABC       | EFG      | 12 | Somedate | Somedate
    B2                  | Last2              | 245               | ABC       | EFG      | 12 | Somedate | Somedate
    B1                  | Last1              | 123               | DEF       | EFG      | 12 | Somedate | Somedate
    B3                  | Last3              | 356               | ABC       | GHI      | 13 | Somedate | Somedate
    B4                  | Last4              | 478               | XYZ       | FHJ      | 13 | Somedate | Somedate
    B5                  | Last5              | 599               | XYZ       | DFI      | 13 | Somedate | Somedate

我想创建一个字典:

 {ID1: (Count of Bank_Acct Firstname, Count of distinct Bank_Acct Lastname, 
        {Bank_AcctNumber1 : ItsCount, Bank_AcctNumber2 : ItsCount}, 
         Count of distinct Firstname, Count of distinct Lastname), 
  ID2: (...), }

对于上面的例子:

{12: (2, 2, {123: 2, 245: 1}, 2, 1), 13 : (3, 3, {356: 1, 478: 1, 599: 1}, 2, 3)}

下面是代码:

cols = ['Bank First Name', 'Bank Last Name' 'Bank AcctNumber', 'First Name', 'Last Name']
    df1 = df.groupby('ID').apply(lambda x: tuple(x[c].nunique() for c in cols))
    d = df1.to_dict()

但上面的代码只给出了输出:

 {12: (2, 2, 2, 2, 1), 13 : (3, 3, 3, 2, 3)}

给出不同银行账户号码的计数,而不是内部字典。

如何获取所需的字典?谢谢!!

【问题讨论】:

    标签: python pandas dictionary dataframe group-by


    【解决方案1】:

    您可以在列表中定义列和函数

    In [15]: cols = [
         ...:     {'col': 'Bank_Acct Firstname', 'func': pd.Series.nunique},
         ...:     {'col': 'Bank_Acct Lastname', 'func': pd.Series.nunique},
         ...:     {'col': 'Bank_AcctNumber', 'func': lambda x: x.value_counts().to_dict()},
         ...:     {'col': 'Firstname', 'func': pd.Series.nunique},
         ...:     {'col': 'Lastname', 'func': pd.Series.nunique}
         ...:     ]
    
    In [16]: df.groupby('ID').apply(lambda x: tuple(c['func'](x[c['col']]) for c in cols))
    Out[16]:
    ID
    12            (2, 2, {123: 2, 245: 1}, 2, 1)
    13    (3, 3, {356: 1, 478: 1, 599: 1}, 2, 3)
    dtype: object
    
    In [17]: (df.groupby('ID')
                .apply(lambda x: tuple(c['func'](x[c['col']]) for c in cols))
                .to_dict())
    Out[17]:
    {12: (2, 2, {123: 2, 245: 1}, 2, 1),
     13: (3, 3, {356: 1, 478: 1, 599: 1}, 2, 3)}
    

    【讨论】:

    • 这可行,但速度极慢。有什么方法可以让这更快吗?我有一个巨大的数据框。
    猜你喜欢
    • 2021-12-23
    • 1970-01-01
    • 2021-06-26
    • 2017-09-12
    • 1970-01-01
    • 1970-01-01
    • 2019-02-27
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多