【问题标题】:Querying deeply nested and complex JSON data with multiple levels多层次查询深度嵌套复杂的 JSON 数据
【发布时间】:2021-09-26 16:35:10
【问题描述】:

我正在努力分解从深度嵌套的复杂 JSON 数据中提取数据所需的方法。我有以下代码来获取 JSON。

import requests
import pandas as pd
import json
import pprint
import seaborn as sns
import matplotlib.pyplot as plt

base_url="https://data.sec.gov/api/xbrl/companyfacts/CIK0001627475.json"
headers={'User-Agent': 'Myheaderdata'}
first_response=requests.get(base_url,headers=headers)
response_dic=first_response.json()   
print(response_dic)
base_df=pd.DataFrame(response_dic)
base_df.head()

它提供了一个显示 JSON 和 Pandas DataFrame 的输出。数据框有两列,第三列 (FACTS) 包含大量嵌套数据。

我想了解的是如何导航到该嵌套结构中,以检索某些数据。例如,我可能想要转到 DEI 级别或 US GAAP 级别并检索特定属性。假设 DEI > EntityCommonStockSharesOutstanding 并获取“标签”、“价值”和“FY”详细信息。

当我尝试如下使用get函数时;

data=[]
for response in response_dic:

        data.append({"EntityCommonStockSharesOutstanding":response.get('EntityCommonStockSharesOutstanding')})
    new_df=pd.DataFrame(data)
    new_df.head()

我最终得到以下属性错误;

AttributeError                            Traceback (most recent call last)
<ipython-input-15-15c1685065f0> in <module>
      1 data=[]
      2 for response in response_dic:
----> 3     data.append({"EntityCommonStockSharesOutstanding":response.get('EntityCommonStockSharesOutstanding')})
      4 base_df=pd.DataFrame(data)
      5 base_df.head()

AttributeError: 'str' object has no attribute 'get'

【问题讨论】:

  • 你看过response_dic的结构了吗?这是一个嵌套字典。你的循环,即for response in response_dic: 只是循环遍历它的键,这些键是字符串 cik、entityName、facts(不知道你为什么这样做)。要导航到“dei”中的“标签”,只需:response_dic['facts']['dei']['EntityCommonStockSharesOutstanding']['label'],结果为“实体普通股,流通股”

标签: python json pandas api python-requests


【解决方案1】:

使用pd.json_normalize:

例如:

entity1 = response_dic['facts']['dei']['EntityCommonStockSharesOutstanding']
entity2 = response_dic['facts']['dei']['EntityPublicFloat']

df1 = pd.json_normalize(entity1, record_path=['units', 'shares'],
                        meta=['label', 'description'])

df2 = pd.json_normalize(entity2, record_path=['units', 'USD'],
                        meta=['label', 'description'])
>>> df1
           end        val                  accn  ...      frame                                    label                                        description
0   2018-10-31  106299106  0001564590-18-028629  ...  CY2018Q3I  Entity Common Stock, Shares Outstanding  Indicate number of shares or other units outst...
1   2019-02-28  106692030  0001627475-19-000007  ...        NaN  Entity Common Stock, Shares Outstanding  Indicate number of shares or other units outst...
2   2019-04-30  107160359  0001627475-19-000015  ...  CY2019Q1I  Entity Common Stock, Shares Outstanding  Indicate number of shares or other units outst...
3   2019-07-31  110803709  0001627475-19-000025  ...  CY2019Q2I  Entity Common Stock, Shares Outstanding  Indicate number of shares or other units outst...
4   2019-10-31  112020807  0001628280-19-013517  ...  CY2019Q3I  Entity Common Stock, Shares Outstanding  Indicate number of shares or other units outst...
5   2020-02-28  113931825  0001627475-20-000006  ...        NaN  Entity Common Stock, Shares Outstanding  Indicate number of shares or other units outst...
6   2020-04-30  115142604  0001627475-20-000018  ...  CY2020Q1I  Entity Common Stock, Shares Outstanding  Indicate number of shares or other units outst...
7   2020-07-31  120276173  0001627475-20-000031  ...  CY2020Q2I  Entity Common Stock, Shares Outstanding  Indicate number of shares or other units outst...
8   2020-10-31  122073553  0001627475-20-000044  ...  CY2020Q3I  Entity Common Stock, Shares Outstanding  Indicate number of shares or other units outst...
9   2021-01-31  124962279  0001627475-21-000015  ...  CY2020Q4I  Entity Common Stock, Shares Outstanding  Indicate number of shares or other units outst...
10  2021-04-30  126144849  0001627475-21-000022  ...  CY2021Q1I  Entity Common Stock, Shares Outstanding  Indicate number of shares or other units outst...

[11 rows x 10 columns]


>>> df2
          end         val                  accn    fy  fp  form       filed      frame                label                                        description
0  2018-10-03   900000000  0001627475-19-000007  2018  FY  10-K  2019-03-07  CY2018Q3I  Entity Public Float  The aggregate market value of the voting and n...
1  2019-06-28  1174421292  0001627475-20-000006  2019  FY  10-K  2020-03-02  CY2019Q2I  Entity Public Float  The aggregate market value of the voting and n...
2  2020-06-30  1532720862  0001627475-21-000015  2020  FY  10-K  2021-02-24  CY2020Q2I  Entity Public Float  The aggregate market value of the voting and n...

【讨论】:

    猜你喜欢
    • 2018-05-14
    • 2020-12-10
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-06-21
    • 1970-01-01
    • 2020-11-21
    相关资源
    最近更新 更多