【问题标题】:How to parse nested json fields in list into dataframe?如何将列表中的嵌套 json 字段解析为数据框?
【发布时间】:2018-08-24 23:05:41
【问题描述】:

我正在进行 API 调用并获取每个 ID 的嵌套 JSON 响应。

如果我为一个 ID 运行 API 调用,JSON 看起来像这样。

u'{"id":26509,"name":"ORD.00001","order_type":"sales","consumer_id":415372,"order_source":"in_store","is_submitted":0,"fulfillment_method":"in_store","order_total":150,"balance_due":150,"tax_total":0,"coupon_total":0,"order_status":"cancelled","payment_complete":null,"created_at":"2017-12-02 19:49:15","updated_at":"2017-12-02 20:07:25","products":[{"id":48479,"item_master_id":239687,"name":"QA_FacewreckHaze","quantity":1,"pricing_weight_id":null,"category_id":1,"subcategory_id":8,"unit_price":"150.00","original_unit_price":"150.00","discount_total":"0.00","created_at":"2017-12-02 19:49:45","sold_weight":10,"sold_weight_uom":"GR"}],"payments":[],"coupons":[],"taxes":[],"order_subtotal":150}'

我可以使用这行代码成功地将这个 JSON 字符串解析成一个数据框:

order_detail = json.loads(r.text)
order_detail = json_normalize(order_detail_staging)

我可以使用以下代码通过 API 迭代我的所有 ID:

lists = []

for id in df.id:
       r = requests.get("URL/v1/orders/{id}".format(id=id), headers = headers_order)
       lists.append(r.text)

现在我所有的 JSON 响应都存储在列表中。如何将列表中的所有元素写入数据框?

我一直在尝试的代码是这样的:

for x in lists:
    order_detail = json.loads(x)
    order_detail = json_normalize(x)
    print(order_detail)

我得到错误:

AttributeError: 'unicode' object has no attribute 'itervalues'

我知道这是在线上发生的:

order_detail = json_normalize(x)

为什么这条线适用于单个 JSON 字符串而不适用于列表?我该怎么做才能将嵌套 JSON 列表放入数据框中?

提前感谢您的帮助。

编辑:

Traceback (most recent call last):

  File "<ipython-input-108-5051d2ceb18b>", line 3, in <module>
    for id in df.id

  File "/Users/bob/anaconda/lib/python2.7/site-packages/requests/models.py", line 802, in json
    return json.loads(self.text, **kwargs)

  File "/Users/bob/anaconda/lib/python2.7/json/__init__.py", line 339, in loads
    return _default_decoder.decode(s)

  File "/Users/bob/anaconda/lib/python2.7/json/decoder.py", line 364, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())

  File "/Users/bob/anaconda/lib/python2.7/json/decoder.py", line 382, in raw_decode
    raise ValueError("No JSON object could be decoded")

ValueError: No JSON object could be decoded

Traceback (most recent call last):

  File "<ipython-input-108-5051d2ceb18b>", line 3, in <module>
    for id in df.id

  File "/Users/bob/anaconda/lib/python2.7/site-packages/requests/models.py", line 802, in json
    return json.loads(self.text, **kwargs)

  File "/Users/bob/anaconda/lib/python2.7/json/__init__.py", line 339, in loads
    return _default_decoder.decode(s)

  File "/Users/bob/anaconda/lib/python2.7/json/decoder.py", line 364, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())

  File "/Users/bob/anaconda/lib/python2.7/json/decoder.py", line 382, in raw_decode
    raise ValueError("No JSON object could be decoded")

【问题讨论】:

    标签: python json pandas python-requests


    【解决方案1】:
    • 使用响应 .json() 方法
    • 直接喂给json_normalize

    例子:

    df = json_normalize([
        requests.get("URL/v1/orders/{id}".format(id=id), headers = headers_order).json()
        for id in df.id
    ])
    

    更新:

    failsaife 版本来处理不正确的响应:

    def gen():
        for id in df.id:
            try:
                yield requests.get("URL/v1/orders/{id}".format(id=id), headers = headers_order).json()
            except ValueError:  # incorrect API response
                pass
    
    df = json_normalize(list(gen()))
    

    【讨论】:

    • 感谢@Marat 的回复。我试过你的线路并得到了错误。 'ValueError: 无法解码任何 JSON 对象'
    • 是pandas提出的还是请求提出的?
    • 所以它是由请求引发的,因为 API 返回无效的 JSON。我编辑了答案以解决这个问题,假设忽略此类响应是安全的,
    • 哇,好用!你能告诉我你是怎么知道这是请求的问题吗?
    • 如果您查看堆栈跟踪,您的代码后面的第一行是File "/Users/bob/anaconda/lib/python2.7/site-packages/requests/models.py" - 即下面的所有内容都在请求中
    【解决方案2】:

    试试这个:

    In [28]: lst = list(set(order_detail) - set(['products','coupons','payments','taxes']))
    
    In [29]: pd.io.json.json_normalize(order_detail, ['products'], lst, meta_prefix='p_')
    Out[29]:
       category_id           created_at discount_total     id  item_master_id              name original_unit_price pricing_weight_id  \
    0            1  2017-12-02 19:49:45           0.00  48479          239687  QA_FacewreckHaze              150.00              None
    
       quantity  sold_weight         ...          p_tax_total  p_order_source p_consumer_id p_payment_complete p_coupon_total  \
    0         1           10         ...                    0        in_store        415372               None              0
    
       p_fulfillment_method  p_order_type p_is_submitted  p_balance_due         p_updated_at
    0              in_store         sales              0            150  2017-12-02 20:07:25
    
    [1 rows x 29 columns]
    

    【讨论】:

    • 感谢您的回复。我收到错误,TypeError: sequence item 0: expected string, numpy.int64 found
    猜你喜欢
    • 1970-01-01
    • 2021-08-13
    • 1970-01-01
    • 1970-01-01
    • 2015-08-07
    • 1970-01-01
    • 1970-01-01
    • 2023-02-03
    • 2015-04-11
    相关资源
    最近更新 更多