【问题标题】:Trouble making a dataframe from API data从 API 数据制作数据框时遇到问题
【发布时间】:2021-07-04 16:25:38
【问题描述】:

我有以下代码:

import requests
import json
from requests.structures import CaseInsensitiveDict
headers = CaseInsensitiveDict()
from pandas import json_normalize 
import pandas as pd

headers["Accept"] = "application/json"
headers["Content-Type"] = "application/json"
headers['Authorization']= "Bearer token"
url = "https:link.com"
payload="""{
 "GradeIds" : [7,8],
 "ReportStartDate" : "2020-01-25T00:00:00",
 "ReportEndDate" : "2020-02-27T00:00:00"
          }"""
response = requests.request( 'POST',url, headers=headers, data=payload)
parsed = json.loads(response.text)
print(parsed)
len(parsed)

df = json_normalize(parsed)

但是,输出有点混乱,json_normalize 函数无法将其转换为干净的数据帧。以下是来自print(parsed) 的 API 输出示例:

[{'grade': {'id': 7, 'name': 'stuff1', 'type': 'data'}, 'endOfDayPrices': [{'reportDate': '2020-01-27T00:00:00', 'month': '2020-03-01T00:00:00', 'price': 3.9}, {'reportDate': '2020-01-28T00:00:00', 'month': '2020-03-01T00:00:00', 'price': 3.95}, {'reportDate': '2020-01-28T00:00:00', 'month': '2020-03-01T00:00:00', 'price': 1.05}, {'reportDate': '2020-01-29T00:00:00', 'month': '2020-03-01T00:00:00', 'price': 1.1}, {'reportDate': '2020-01-30T00:00:00', 'month': '2020-03-01T00:00:00', 'price': 0.85}}]

当打印 df 数据框时,我得到以下信息:

 endOfDayPrices  grade.id grade.name  \                            
 0  [{'reportDate': '2020-01-27T00:00:00', 'month'...         7      data
 1  [{'reportDate': '2020-01-27T00:00:00', 'month'...         8      data

在检查此列表的长度时 len(parsed) 它说只有 2 个,grade 和 endofDayPrices。

有谁知道如何解压此列表并拥有如下所示的数据框:

grade    reportDate          price
7      2020-01-27T00:00:00   2.3
7      2020-01-28T00:00:00   3.95

,etc. 

【问题讨论】:

    标签: python json pandas api python-requests


    【解决方案1】:

    考虑到您的json 是:

    In [1977]: l = [{'grade': {'id': 7, 'name': 'stuff', 'type': 'data'}, 'endOfDayPrices': [{'reportDate': '2020-01-27T00:00:00', 'month': '2020-03-01T00:00:00', 'price': 2.3}, {'reportDate': '2020-01-28T00:00:00', 'month': '2020-03-01T00:00:00', 'price': 3
          ...: .95}, {'reportDate': '2020-01-29T00:00:00', 'month': '2020-03-01T00:00:00', 'price': 2.5}, {'reportDate': '2020-01-30T00:00:00', 'month': '2020-03-01T00:00:00', 'price': 4.0}]}]
    

    你可以这样做:

    In [2079]: df = pd.DataFrame()
    
    In [2083]: for i in l:
          ...:     d1 = {}
          ...:     reportDate = []
          ...:     price = []
          ...:     grade = []
          ...:     d1['grade'] = i['grade']['id']
          ...:     for j in i['endOfDayPrices']:
          ...:         reportDate.append(j['reportDate'])
          ...:         price.append(j['price'])
          ...:     d1['reportDate'] = reportDate
          ...:     d1['price'] = price
          ...:     df = df.append(pd.DataFrame(d1))
          ...: 
          ...: 
    
    In [2084]: df
    Out[2084]: 
        grade           reportDate  price
    0       7  2020-01-27T00:00:00  3.900
    1       7  2020-01-28T00:00:00  3.950
    2       7  2020-01-29T00:00:00  4.000
    3       7  2020-01-30T00:00:00  4.000
    4       7  2020-01-31T00:00:00  3.900
    5       7  2020-02-03T00:00:00  3.600
    6       7  2020-02-04T00:00:00  3.700
    7       7  2020-02-05T00:00:00  3.700
    8       7  2020-02-06T00:00:00  3.350
    9       7  2020-02-07T00:00:00  3.400
    10      7  2020-02-10T00:00:00  3.300
    11      7  2020-02-11T00:00:00  3.500
    12      7  2020-02-12T00:00:00  3.500
    13      7  2020-02-13T00:00:00  3.500
    14      7  2020-02-14T00:00:00  3.550
    15      7  2020-02-18T00:00:00  3.350
    16      7  2020-02-19T00:00:00  3.150
    17      7  2020-02-20T00:00:00  3.550
    18      7  2020-02-21T00:00:00  3.554
    19      7  2020-02-24T00:00:00  3.555
    20      7  2020-02-25T00:00:00  3.555
    21      7  2020-02-26T00:00:00  2.900
    22      7  2020-02-27T00:00:00  2.700
    0       8  2020-01-27T00:00:00  1.200
    1       8  2020-01-28T00:00:00  1.050
    2       8  2020-01-29T00:00:00  1.100
    3       8  2020-01-30T00:00:00  0.850
    4       8  2020-01-31T00:00:00  0.900
    5       8  2020-02-03T00:00:00  0.650
    6       8  2020-02-04T00:00:00  0.800
    7       8  2020-02-05T00:00:00  1.250
    8       8  2020-02-06T00:00:00  0.900
    9       8  2020-02-07T00:00:00  0.950
    10      8  2020-02-10T00:00:00  0.800
    11      8  2020-02-11T00:00:00  0.950
    12      8  2020-02-12T00:00:00  0.800
    13      8  2020-02-13T00:00:00  0.850
    14      8  2020-02-14T00:00:00  0.850
    15      8  2020-02-18T00:00:00  0.800
    16      8  2020-02-19T00:00:00  1.000
    17      8  2020-02-20T00:00:00  0.933
    18      8  2020-02-21T00:00:00  1.015
    19      8  2020-02-24T00:00:00  1.021
    20      8  2020-02-25T00:00:00  1.020
    21      8  2020-02-26T00:00:00  0.600
    22      8  2020-02-27T00:00:00  1.000
    

    【讨论】:

    • 它输出了错误的分数,给所有分数都打了 8 分,你知道这是为什么吗?
    • @MichelleM 请检查我的更新答案。
    猜你喜欢
    • 2023-03-05
    • 2016-06-03
    • 2017-06-12
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-02-27
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多