【问题标题】:Python dataframe column of lists of dicts into columns with single elements将字典列表的 Python 数据框列转换为具有单个元素的列
【发布时间】:2021-10-04 07:35:33
【问题描述】:

我尝试以不同的格式提出这个问题,但我得到的答案是针对问题的特定部分而不是整个问题。为了避免混淆,我再次尝试并以不同的方式表达问题。

我有一个数据框,其中几列具有常规数据,但一列具有作为元素的字典列表。这是一个例子。

list_of_dicts = [{'a':'sam','b':2},{'a':'diana','c':'grape', 'd':5},{'a':'jody','c':7,'e':'foo','f':9}]
list_of_dicts_2 = [{'a':'joe','b':2},{'a':'steve','c':'pizza'},{'a':'alex','c':7,'e':'doh'}]

df4.loc[0,'lists_of_stuff'] = list_of_dicts
df4.loc[1,'lists_of_stuff'] = list_of_dicts_2

df4.loc[0,'other1'] = 'Susie'
df4.loc[1,'other1'] = 'Rachel'

df4.loc[0,'other2'] = 123
df4.loc[1,'other2'] = 456

df4
    other1  lists_of_stuff                                                              other2
0   Susie   [{'a':'sam','b':2},{'a':'diana','c':'grape', 'd':5},{'a':'jody','c':7,'e':'foo','f':9}]                 123
1   Rachel  [{'a':'joe','b':2},{'a':'steve','c':'pizza'},{'a':alex,'c':7,'e':'doh'}]        456

我正在尝试将这些字典拆分为列,以便我拥有一个更简单的数据框。像这样的东西(列顺序可能不同)

    other1 a_1   b   a_2   c     d   a_3      c_2   e   f   other2
0   Susie  sam   2   diana grape 5   jody     7     foo 9   123
1   Rachel joe   2   steve pizza NaN alex     7     doh NaN 456

或者像这样

    other1 a     b   c     d   e   f   other2
0   Susie  sam   2   NaN   NaN NaN NaN 123
1   Susie  diana NaN 4     5   NaN NaN 123
2   Susie  jody  NaN 7     NaN foo 9   123
3   Rachel joe   2   NaN   NaN NaN NaN 456 
4   Rachel steve NaN pizza NaN NaN NaN 456
5   Rachel alex  NaN 7     NaN doh NaN 456

起作用的两件事是pd.DataFrame(df4['list_of_stuff'])(它只是按原样显示数据框;即它不会改变任何东西)和pd.json_normalize(df4['list_of_stuff']) (这会引发错误)。此外,flatten_json 和涉及 Series 的解决方案也没有产生可行的结果。

将 df4 转换为提议的输出之一的正确 Python 方法是什么?

(是的,我在其他地方问了几乎相同的问题。List of variable size dicts to a dataframe。那个问题不清楚,所以我决定用一个新问题再试一次,而不是在另一个问题上添加一堆东西,使其难以理解。 )

【问题讨论】:

    标签: python list dataframe dictionary flatten


    【解决方案1】:

    试试:

    # if the lists_of_stuff are strings, apply literal_eval
    #from ast import literal_eval
    #df["lists_of_stuff"] = df["lists_of_stuff"].apply(literal_eval)
    
    df = df.explode("lists_of_stuff")
    df = pd.concat([df, df.pop("lists_of_stuff").apply(pd.Series)], axis=1)
    print(df)
    

    打印:

       other1  other2      a    b      c    d    e    f
    0   Susie     123    sam  2.0    NaN  NaN  NaN  NaN
    0   Susie     123  diana  NaN  grape  5.0  NaN  NaN
    0   Susie     123   jody  NaN      7  NaN  foo  9.0
    1  Rachel     456    joe  2.0    NaN  NaN  NaN  NaN
    1  Rachel     456  steve  NaN  pizza  NaN  NaN  NaN
    1  Rachel     456   alex  NaN      7  NaN  doh  NaN
    

    编辑:重新索引列:

    #... code as above
    df = df.reset_index(drop=True).reindex(
        [*df.columns[:1]] + [*df.columns[2:]] + [*df.columns[1:2]], axis=1
    )
    print(df)
    

    打印:

       other1      a    b      c    d    e    f  other2
    0   Susie    sam  2.0    NaN  NaN  NaN  NaN     123
    1   Susie  diana  NaN  grape  5.0  NaN  NaN     123
    2   Susie   jody  NaN      7  NaN  foo  9.0     123
    3  Rachel    joe  2.0    NaN  NaN  NaN  NaN     456
    4  Rachel  steve  NaN  pizza  NaN  NaN  NaN     456
    5  Rachel   alex  NaN      7  NaN  doh  NaN     456
    

    【讨论】:

      猜你喜欢
      • 2021-07-01
      • 1970-01-01
      • 2020-01-18
      • 2023-01-13
      • 1970-01-01
      • 1970-01-01
      • 2018-02-05
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多