【问题标题】:Converting series dictionary from a dataframe column to separate columns in the same dataframe将系列字典从数据框列转换为同一数据框中的单独列
【发布时间】:2020-01-31 04:00:51
【问题描述】:
I have this dataframe(df) that looks like this
`
user_id |date       |last_dep_amt| dep_amt| Bin       | Action    
1031    |2017-03-11 |200.0       |100     | 100-200   | [{'A1':[350,400,450],
                                                          'A2':[450,480,490],
                                                          'A3':[500,550,600],
                                                          'A4':[650, 700,850],
                                                          'A5':[750,800,950],
                                                       'Last_5_deposits':[50],
                                                       'num_unique_a1':3,
                                                       'num_unique_a2':4,
                                                       'num_unique_a3':7,
                                                       'num_unique_a4':8,
                                                       'num_unique_a5':9}]
1031    |2017-03-12 |300.0       |120     | 100-200   | [{'A1':[250,300,550],
                                                          'A2':[150,440,460],
                                                          'A3':[250,300,430],
                                                          'A4':[350, 500,650],
                                                          'A5':[650,700,780],
                                                       'Last_5_deposits':[50],
                                                       'num_unique_a1':3,
                                                       'num_unique_a2':4,
                                                       'num_unique_a3':7,
                                                       'num_unique_a4':8,
                                                       'num_unique_a5':9}]
231 |2017-03-14 |350.0       |130     | 100-200   | [{'A1':[250,300,550],
                                                          'A2':[150,440,460],
                                                          'A3':[250,300,430],
                                                          'A4':[350, 500,650],
                                                          'A5':[650,700,780],
                                                       'Last_5_deposits':[50],
                                                       'num_unique_a1':3,
                                                       'num_unique_a2':4,
                                                       'num_unique_a3':7,
                                                       'num_unique_a4':8,
                                                       'num_unique_a5':9}]
`      
Essentially containing 6 columns. Where the last column('Action') of the dataframe is list of dictionary.  

所以我需要将最后一列('Action')拆分为多个列,如下所示 例如: user_id|日期|last_dep_amt| dep_amt|Bin|A1|A2|A3|A4|A5|Last_5_deposits| num_unique_a1|num_unique_a2|num_unique_a3|num_unique_a4|num_unique_a5

关于数据框的一些信息 type(df['Action']) - pandas.core.series.Series type(df) - pandas.core.frame.DataFrame

预期输出:Action 列下的所有子列都必须拆分为单独的列 user_id|date|last_dep_amt|dep_amt|Bin|A1|A2|A3|A4|A5|Last_5_deposits| num_unique_a1|num_unique_a2|num_unique_a3|num_unique_a4|num_unique_a5 `

  +---------+-----------+--------------+---------+---------+---------------+---------------+---------------+----------------+---------------+-----------------+---------------+---------------+---------------+----------------+----------------+
    | user_id |   date    | last_dep_amt | dep_amt |   Bin   |      A1       |      A2       |      A3       |       A4       |      A5       | Last_5_deposits | num_unique_a1 | num_unique_a2 | num_unique_a3 |  num_unique_a4 | num_unique_a5  |
    +---------+-----------+--------------+---------+---------+---------------+---------------+---------------+----------------+---------------+-----------------+---------------+---------------+---------------+----------------+----------------+
    |    1031 | 3/11/2017 |          200 |     100 | 100-200 | [350,400,450] | [450,480,490] | [500,550,600] | [650, 700,850] | [750,800,950] | [50]            |             3 |             4 |             7 |              8 |              9 |
    +---------+-----------+--------------+---------+---------+---------------+---------------+---------------+----------------+---------------+-----------------+---------------+---------------+---------------+----------------+----------------+

`
Also have attached below the link that contains an image of the expected final output needed from the above dataframe(df)
`
<https://ibb.co/0JyKhHQ>

【问题讨论】:

    标签: python pandas dataframe dictionary series


    【解决方案1】:

    使用pandas.concat:

    df_action=pd.concat([pd.DataFrame(key) for key in df['Action']]).reset_index(drop=True)
    new_df=pd.concat([df[['user_id','date','last_dep_amt','dep_amt','Bin']],df_action],axis=1)
    print(new_df)
    

    输出:

       user_id        date  last_dep_amt  dep_amt      Bin               A1  \
    0     1031  2017-03-11         200.0      100  100-200  [350, 400, 450]   
    1     1031  2017-03-12         300.0      120  100-200  [250, 300, 550]   
    2      231  2017-03-14         350.0      130  100-200  [250, 300, 550]   
    
                    A2               A3               A4               A5  \
    0  [450, 480, 490]  [500, 550, 600]  [650, 700, 850]  [750, 800, 950]   
    1  [150, 440, 460]  [250, 300, 430]  [350, 500, 650]  [650, 700, 780]   
    2  [150, 440, 460]  [250, 300, 430]  [350, 500, 650]  [650, 700, 780]   
    
      Last_5_deposits  num_unique_a1  num_unique_a2  num_unique_a3  num_unique_a4  \
    0            [50]              3              4              7              8   
    1            [50]              3              4              7              8   
    2            [50]              3              4              7              8   
    
       num_unique_a5  
    0              9  
    1              9  
    2              9 
    

    【讨论】:

    • @ansev 我尝试了你的代码第一行f_action=pd.concat([pd.DataFrame(key) for key in df['Action']]).reset_index(drop=True) 给了我以下错误:ValueError: DataFrame constructor not properly called! 所以我也尝试了这个:df_action=pd.concat([pd.DataFrame(columns=['key']) for key in ab2['Action']]).reset_index(drop=True)。这有效,但在执行 print(new_df) 时给了我旧列,例如 userid、date、last_dep_amt、dep_amt、bin 和新列“key”,其中这个“key”列下的值都是 NaN 行值。
    猜你喜欢
    • 1970-01-01
    • 2020-03-29
    • 1970-01-01
    • 1970-01-01
    • 2021-01-11
    • 2022-11-23
    • 2018-07-27
    • 2023-03-12
    • 2022-06-15
    相关资源
    最近更新 更多