【问题标题】:How to convert nested dictionaries to a pandas DataFrame?如何将嵌套字典转换为 pandas DataFrame?
【发布时间】:2019-12-09 11:10:23
【问题描述】:

我想将调用结果从 API 转换为数据帧。 API 调用的结果是一个嵌套字典,但生成的数据帧不是我需要的。

除了json_normalize,我还尝试了pd.DataFrame.from_dict。然而,直到现在都没有成功。我也试过把字典弄平,但没有。

我使用了以下调用:

[73] results = requests.get(url).json()
results

输出是:

{'result': {'totalrows': 3124,
  'rows': [{'rownum': 1,
    'values': [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'},
     {'field': 'issueid', 'value': 472683},
     {'field': 'ticker', 'value': 'AAPL'},
     {'field': 'companyname', 'value': 'APPLE INC'},
     {'field': 'issuetitle', 'value': 'COM'},
     {'field': 'filerid', 'value': 1089387}]},
   {'rownum': 2,
    'values': [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'},
     {'field': 'issueid', 'value': 472683},
     {'field': 'ticker', 'value': 'AAPL'},
     {'field': 'companyname', 'value': 'APPLE INC'},
     {'field': 'issuetitle', 'value': 'COM'},
     {'field': 'filerid', 'value': 1086893}]},
   {'rownum': 3,
    'values': [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'},
     {'field': 'issueid', 'value': 472683},
     {'field': 'ticker', 'value': 'AAPL'},
     {'field': 'companyname', 'value': 'APPLE INC'},
     {'field': 'issuetitle', 'value': 'COM'},
     {'field': 'filerid', 'value': 1085803}]}

然后为了生成数据框,我使用了以下代码:


[74] Owners = results['result']['rows']
df1 = json_normalize(Owners)
df1.head()

这是输出:

  rownum    values
0   1      [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'}, 
           {'field': 'issueid', 'value': 472683}, {'field': 
           'ticker', 'value': 'AAPL'}, {'field': 'companyname', 
           'value': 'APPLE INC'}, {'field': 'issuetitle', 'value': 
           'COM'}, {'field': 'filerid', 'value': 1089387} 

1   2      [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'}, 
           {'field': 'issueid', 'value': 472683}, {'field': 
           'ticker', 'value': 'AAPL'}, {'field': 'companyname', 
           'value': 'APPLE INC'}, {'field': 'issuetitle', 'value': 
           'COM'}, {'field': 'filerid', 'value': 1086893}

2   3      [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'}, {'field': 
           'issueid', 'value': 472683}, {'field': 'ticker', 'value': 'AAPL'}, 
           {'field': 'companyname', 'value': 'APPLE INC'}, {'field': 
           'issuetitle', 'value': 'COM'}, {'field': 'filerid', 'value': 1085803}

但是,我想获得一个格式如下的DataFrame:

【问题讨论】:

    标签: python json pandas dataframe dictionary


    【解决方案1】:

    您可以使用pandas.DataFrame.from_dict,但您需要删除数据中所有不必要的数据。实际上,您只想保留每行的 field 值和 value。您可以通过列表理解来做到这一点:

    data = [{ field["field"]:field["value"] for field in row['values']
                        } for row in data['result']["rows"]]
    print(data)
    # [{'querydate': '7/31/2019 3:19 PM', 
    #     'issueid': 472683, 
    #     'ticker': 'AAPL', 
    #     'companyname': 'APPLE INC',
    #     'issuetitle': 'COM',
    #     'filerid': 1089387},
    # {
    #     'querydate': '7/31/2019 3:19 PM',
    #     'issueid': 472683,
    #     'ticker': 'AAPL',
    #     'companyname': 'APPLE INC',
    #     'issuetitle': 'COM',
    #     'filerid': 1086893},
    # {
    #     'querydate': '7/31/2019 3:19 PM', 
    #     'issueid': 472683, 
    #     'ticker': 'AAPL', 
    #     'companyname': 'APPLE INC', 
    #     'issuetitle': 'COM', 
    #     'filerid': 1085803
    # }]
    

    一旦你有了这本字典,你就可以调用from_dict方法:

    df = pd.DataFrame.from_dict(data)
    print(df)
    #   companyname  filerid  issueid issuetitle          querydate ticker
    # 0   APPLE INC  1089387   472683        COM  7/31/2019 3:19 PM   AAPL
    # 1   APPLE INC  1086893   472683        COM  7/31/2019 3:19 PM   AAPL
    # 2   APPLE INC  1085803   472683        COM  7/31/2019 3:19 PM   AAPL
    

    如果您想将rownum 作为列(或索引):

    data = [{**{field["field"]:field["value"] for field in row['values']}, **{'rownum': row["rownum"]}} for row in data['result']["rows"]]
    
    df = pd.DataFrame.from_dict(data)
    print(df)
    #   companyname  filerid  issueid issuetitle          querydate  rownum ticker
    # 0   APPLE INC  1089387   472683        COM  7/31/2019 3:19 PM       1   AAPL
    # 1   APPLE INC  1086893   472683        COM  7/31/2019 3:19 PM       2   AAPL
    # 2   APPLE INC  1085803   472683        COM  7/31/2019 3:19 PM       3   AAPL
    

    【讨论】:

    • 差不多了,只是缺少rownum
    • @Akaisteph7 为了在列表理解中做到这一点,我设法通过合并两个 dict 来做到这一点。也许有更好的想法?
    • 非常感谢! @AlexandreB。成功了!我不需要rownum,但在我的问题中没有提到它。
    【解决方案2】:

    朴素的嵌套 for 循环尝试...

    import pandas as pd
    
    df = pd.DataFrame([])
    
    for row in json["result"]["rows"]:
        rownum = row["rownum"]
        querydate = issueid = ticker = companyname = issuetitle = filerid = None
        for value_dict in row["values"]:
            if value_dict["field"] == "querydate":
                querydate = value_dict["value"]
            elif value_dict["field"] == "issueid":
                issueid = value_dict["value"]
            elif value_dict["field"] == "ticker":
                ticker = value_dict["value"]
            elif value_dict["field"] == "companyname":
                companyname = value_dict["value"]
            elif value_dict["field"] == "filerid":
                filerid = value_dict["value"]
        df = df.append(pd.DataFrame({"rownum": rownum,
                                     "querydate": querydate,
                                     "issueid": issueid,
                                     "ticker": ticker,
                                     "companyname": companyname,
                                     "issuetitle": issuetitle,
                                     "filerid": filerid,
                                    }, index=[0]), ignore_index=True)
    
    print(df)
    

    JSON 对象:

    json = {
        "result": {
            "totalrows": 3,
            "rows": [
                {
                    "rownum": 1,
                    "values": [
                        {
                            "field": "querydate",
                            "value": "7/31/2019 3:19 PM"
                        },
                        {
                            "field": "issueid",
                            "value": 472683
                        },
                        {
                            "field": "ticker",
                            "value": "AAPL"
                        },
                        {
                            "field": "companyname",
                            "value": "APPLE INC"
                        },
                        {
                            "field": "issuetitle",
                            "value": "COM"
                        },
                        {
                            "field": "filerid",
                            "value": 1089387
                        }
                    ]
                },
                {
                    "rownum": 2,
                    "values": [
                        {
                            "field": "querydate",
                            "value": "7/31/2019 3:19 PM"
                        },
                        {
                            "field": "issueid",
                            "value": 472683
                        },
                        {
                            "field": "ticker",
                            "value": "AAPL"
                        },
                        {
                            "field": "companyname",
                            "value": "APPLE INC"
                        },
                        {
                            "field": "issuetitle",
                            "value": "COM"
                        },
                        {
                            "field": "filerid",
                            "value": 1086893
                        }
                    ]
                },
                {
                    "rownum": 3,
                    "values": [
                        {
                            "field": "querydate",
                            "value": "7/31/2019 3:19 PM"
                        },
                        {
                            "field": "issueid",
                            "value": 472683
                        },
                        {
                            "field": "ticker",
                            "value": "AAPL"
                        },
                        {
                            "field": "companyname",
                            "value": "APPLE INC"
                        },
                        {
                            "field": "issuetitle",
                            "value": "COM"
                        },
                        {
                            "field": "filerid",
                            "value": 1085803
                        }
                    ]
                }
            ]
        }
    }
    

    输出:

       rownum          querydate  issueid ticker companyname issuetitle  filerid
    0       1  7/31/2019 3:19 PM   472683   AAPL   APPLE INC        COM  1089387
    1       2  7/31/2019 3:19 PM   472683   AAPL   APPLE INC        COM  1086893
    2       3  7/31/2019 3:19 PM   472683   AAPL   APPLE INC        COM  1085803
    

    【讨论】:

    • 感谢您的贡献!谢谢@shash678
    猜你喜欢
    • 1970-01-01
    • 2023-01-17
    • 2013-11-16
    • 2019-04-08
    • 2023-02-10
    • 2018-10-26
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多