【问题标题】:Explode list into columns in a dataframe将列表分解为数据框中的列
【发布时间】:2021-11-07 02:42:54
【问题描述】:

我有一个包含两列的数据框:ID 和 Demographic_distribution。 ID 只是一个数字(例如:123456)。 population_impression 对每个 ID 都有一个列表。这是您在一个 ID 的人口统计分布列中获得的示例:

ID : 123456

Demographic_distribution :[{百分比:0.000952,年龄:25-34,性别:未知},{百分比:0.093621,年龄:55-64,性别:男性},{百分比:0.002856,年龄:35-44,性别:未知},{百分比:0.031736,年龄:18-24,性别:女性},{百分比:0.085052,年龄:25-34,性别:男性},{百分比:0.019994,年龄:18-24,性别:男性},{百分比:0.085687,年龄:35-44,性别:男性},{百分比:0.133608,年龄:55-64,性别:女性},{百分比:0.112345,年龄:65+,性别:女性},{百分比:0.000317,年龄:18-24,性别:未知},{百分比:0.095208,年龄:45-54,性别:女性},{百分比:0.067598,年龄:65+,性别:男性},{百分比:0.086004 ,年龄:45-54,性别:男},{百分比:0.075849,年龄:25-34,性别:女},{百分比:0.098699,年龄:35-44,性别:女},{百分比:0.003174,年龄:65+,性别:未知},{百分比:0.003174,年龄:45-54,性别:未知},{百分比:0.004126,年龄:55-64,性别:未知}]

您可以看到有 5 个年龄组、3 个性别和许多百分比。我想将人口统计列拆分为每个参数的三个不同列。我们不要忘记,这些信息在每一行中都被喜欢到一个 ID 中,否则它没有意义。 我试过 .explode,但没有用。

知道怎么做吗?

【问题讨论】:

    标签: python pandas list dataframe explode


    【解决方案1】:

    我做过一次这样的尝试。

    import json
    import os
    import pandas as pd
    
    #if Demographic Distribution is more then you can iterate through loop
    
    Demographic_distribution = [
        {"percentage": 0.000952, "age": "25-34", "gender": "unknown"}, 
        {"percentage": 0.093621, "age": "55-64", "gender": "male"}, 
        {"percentage": 0.002856, "age": "35-44", "gender": "unknown"}, 
        {"percentage": 0.031736, "age": "18-24", "gender": "female"}, 
        {"percentage": 0.085052, "age": "25-34", "gender": "male"}, 
        {"percentage": 0.019994, "age": "18-24", "gender": "male"}, 
        {"percentage": 0.085687, "age": "35-44", "gender": "male"}]
    
    df = pd.DataFrame.from_dict(Demographic_distribution)
    df['ID'] = 123456
    
    df.to_csv("D:\\Path\\Output.csv",index=False)
    

    输出

    【讨论】:

    • 您的答案可以通过额外的支持信息得到改进。请edit 添加更多详细信息,例如引用或文档,以便其他人可以确认您的答案是正确的。你可以找到更多关于如何写好答案的信息in the help center
    【解决方案2】:

    试试:

    df = df.explode("Demographic_distribution")
    
    df = pd.concat(
        [df, df.pop("Demographic_distribution").apply(pd.Series)], axis=1
    )
    print(df)
    

    打印:

           ID  percentage    age   gender
    0  123456    0.000952  25-34  unknown
    0  123456    0.093621  55-64     male
    0  123456    0.002856  35-44  unknown
    0  123456    0.031736  18-24   female
    0  123456    0.085052  25-34     male
    0  123456    0.019994  18-24     male
    0  123456    0.085687  35-44     male
    0  123456    0.133608  55-64   female
    0  123456    0.112345    65+   female
    0  123456    0.000317  18-24  unknown
    0  123456    0.095208  45-54   female
    0  123456    0.067598    65+     male
    0  123456    0.086004  45-54     male
    0  123456    0.075849  25-34   female
    0  123456    0.098699  35-44   female
    0  123456    0.003174    65+  unknown
    0  123456    0.003174  45-54  unknown
    0  123456    0.004126  55-64  unknown
    

    df 已使用:

           ID                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Demographic_distribution
    0  123456  [{'percentage': 0.000952, 'age': '25-34', 'gender': 'unknown'}, {'percentage': 0.093621, 'age': '55-64', 'gender': 'male'}, {'percentage': 0.002856, 'age': '35-44', 'gender': 'unknown'}, {'percentage': 0.031736, 'age': '18-24', 'gender': 'female'}, {'percentage': 0.085052, 'age': '25-34', 'gender': 'male'}, {'percentage': 0.019994, 'age': '18-24', 'gender': 'male'}, {'percentage': 0.085687, 'age': '35-44', 'gender': 'male'}, {'percentage': 0.133608, 'age': '55-64', 'gender': 'female'}, {'percentage': 0.112345, 'age': '65+', 'gender': 'female'}, {'percentage': 0.000317, 'age': '18-24', 'gender': 'unknown'}, {'percentage': 0.095208, 'age': '45-54', 'gender': 'female'}, {'percentage': 0.067598, 'age': '65+', 'gender': 'male'}, {'percentage': 0.086004, 'age': '45-54', 'gender': 'male'}, {'percentage': 0.075849, 'age': '25-34', 'gender': 'female'}, {'percentage': 0.098699, 'age': '35-44', 'gender': 'female'}, {'percentage': 0.003174, 'age': '65+', 'gender': 'unknown'}, {'percentage': 0.003174, 'age': '45-54', 'gender': 'unknown'}, {'percentage': 0.004126, 'age': '55-64', 'gender': 'unknown'}]
    

    【讨论】:

      【解决方案3】:

      json.normalize 是你所追求的。

      pandas docs

      【讨论】:

      • 是的,这很有帮助。
      猜你喜欢
      • 1970-01-01
      • 2022-12-22
      • 2018-03-26
      • 2017-08-29
      • 1970-01-01
      • 2018-10-31
      • 1970-01-01
      • 2018-02-19
      • 1970-01-01
      相关资源
      最近更新 更多