【问题标题】:Create a new DataFrame adding each key from a column dict as header创建一个新的 DataFrame,将列 dict 中的每个键添加为标题
【发布时间】:2023-04-01 20:03:01
【问题描述】:

我有一个 DataFrame,其中包含带有字典的特定列。

我想在 DataFrame 中为包含字典的列中每个元素上找到的每个键添加一个新标题,如果该元素不包含该标题,则分配给这些新单元格的每个新值都应对应于 None键和相应的键值,否则。

以下是用于测试和可视化我所说内容的数据:

导入依赖:

import pandas as pd
import numpy as np

创建包含内部字典列表的字典:

data = {'string_info': ['User1', 'User2', 'User3'],
        'dict_info': [{'elm1': 'attr5', 'elm2': 'attr9', 'elm3': 'attr33'},
                 {'elm5': 'attr31', 'elm7': 'attr13'},
                 {'elm5': 'attr28', 'elm1': 'attr23', 'elm2': 'attr33','elm6': 'attr33'}],
        'int_info': [4, 24, 31],}

为测试创建适当的初始 DataFrame:

df = pd.DataFrame.from_dict(data)
df

手动说明我想要的输出:

data2 = {'string_info': ['User1', 'User2', 'User3'],
        'elm1': ['attr5',None,'attr23'],
        'elm2': ['attr9',None,'attr33'],
        'elm3': ['attr33',None,None],
        'elm4': [None,None,None],
        'elm5': [None,'attr31',None],
        'elm6': [None,None,'attr33'],
        'elm7': [None,None,'attr13'],
        'int_info': [4, 24, 31]}

期望的输出是:

df2 = pd.DataFrame.from_dict(data2)
df2

谢谢!

【问题讨论】:

    标签: python pandas dictionary dataframe multiple-columns


    【解决方案1】:

    您可以使用concatDataFrame 构造函数将dict 替换为列:

    print (pd.DataFrame(df.dict_info.values.tolist()))
         elm1    elm2    elm3    elm5    elm6    elm7
    0   attr5   attr9  attr33     NaN     NaN     NaN
    1     NaN     NaN     NaN  attr31     NaN  attr13
    2  attr23  attr33     NaN  attr28  attr33     NaN
    
    print (pd.concat([pd.DataFrame(df.dict_info.values.tolist()),
                      df[['int_info','string_info']]], axis=1))
         elm1    elm2    elm3    elm5    elm6    elm7  int_info string_info
    0   attr5   attr9  attr33     NaN     NaN     NaN         4       User1
    1     NaN     NaN     NaN  attr31     NaN  attr13        24       User2
    2  attr23  attr33     NaN  attr28  attr33     NaN        31       User3
    

    如果需要Nones 添加replace:

    print (pd.concat([pd.DataFrame(df.dict_info.values.tolist()).replace({np.nan:None}), 
                      df[['int_info','string_info']]], axis=1))
         elm1    elm2    elm3    elm5    elm6    elm7  int_info string_info
    0   attr5   attr9  attr33    None    None    None         4       User1
    1    None    None    None  attr31    None  attr13        24       User2
    2  attr23  attr33    None  attr28  attr33    None        31       User3
    

    【讨论】:

    • 非常感谢,成功了!我肯定会查看更多关于 pd.concat 的信息,谢谢!
    猜你喜欢
    • 2016-07-13
    • 2022-01-28
    • 1970-01-01
    • 2020-10-20
    • 2021-11-23
    • 2020-02-11
    • 1970-01-01
    • 1970-01-01
    • 2020-09-09
    相关资源
    最近更新 更多