嵌套 Json 的分页答案

【问题标题】：pagination for nested Json嵌套 Json 的分页
【发布时间】：2021-11-08 23:03:19
【问题描述】：

我有一个分页 API，我正在尝试浏览所有可用数据并将其保存到一个列表中。但是，我的 API 的本质是它是嵌套的，这里是它的外观示例。

"data": [{"type": "general-Type", "id": 1, "attributes": {"firstname": "Kevin", "lastname": "Wolf", "emailaddress": "kevinwolf@gmail.com"}}]

因此，当我将其保存到列表中时，数据的最后一部分（即“属性”）看起来像字典，导致以下错误：

    sample_data.extend(sample_data['data'])
AttributeError: 'dict' object has no attribute 'extend'

我是新手，因此有关如何成功完成此请求的任何帮助都会有所帮助提前谢谢你

如果有帮助，这是我的代码：请求限制为 10,000，这就是我将限制设置为 10,000 增量的原因


sample_data = []
offset = 0
limit = 10000

while True:
    print("----")
    url = f"https://results.us.sampledata.com/api/reporting/v0.1.0/samples?offset={offset}&page[size]={limit}"
    headers = {"Content-Type": "application/json", "Accept-Charset": "UTF-8", "x-apikey-token": "sampletoken"}
    print("Requesting", url)
    response = requests.get(url, data={"sample": "data"}, headers=headers)
    sample_data = response.json()

    if len(sample_data['data']) == 0:
        # If not, exit the loop
        break

    # If we did find records, add them
    # to our list and then move on to the next offset
    sample_data.extend(sample_data['data'])

    offset = offset + 10000

【问题讨论】：

sample_data = [] 和 sample_data = response.json()。使用不同的名称。
这也不起作用，因为列表仍会将 json 的最后一部分视为 dict
dict 对象是sample_data = response.json()。
尝试重命名：sample_data_list = [] 和 sample_data_list.extend(sample_data['data'])。
AttributeError: 'dict' object has no attribute 'extend'。这意味着您正在尝试对没有此方法的对象使用列表方法（扩展）。

标签： python json api python-requests pagination

【解决方案1】：

正如@8349697 已经说过的，您的问题是您使用相同的名称sample_data 来保留两个不同的结构。

首先你创建列表sample_data = []，然后你用字典sample_data = response.json()覆盖它，但接下来你想使用原始列表sample_data从字典sample_data添加值

你应该使用不同的名字，比如

page_data = response.json()

if not page_data['data']: # if len(page_data['data']) == 0:
    break

sample_data.extend(page_data['data'])

包含其他更改的最小代码 - 但我无法使用您的网址对其进行测试。

import requests

sample_data = []

headers = {
    "Content-Type": "application/json",
    "Accept-Charset": "UTF-8",
    "x-apikey-token": "sampletoken"
}

data = {
    "sample": "data"
}

params = {
    "offset": 0,
    "page[size]": 10000,
}

url = "https://results.us.sampledata.com/api/reporting/v0.1.0/samples"

while True:
    print("----")
    
    #url = f"https://results.us.sampledata.com/api/reporting/v0.1.0/samples?offset={offset}&page[size]={limit}"
    #print("Requesting", url)
    
    print('Offset:', params['offset'])

    response = requests.get(url, params=params, data=data, headers=headers)
    page_data = response.json()

    if (not 'data' in page_data) or (not page_data['data']): 
        break

    sample_data.extend(page_data['data'])

    params['offset'] += 10000

【讨论】：

谢谢！这修复了错误，但即使我使用我知道即使 api 为空的情况下它保持运行的长度的 API 对其进行测试，循环也会继续运行。我试过你的方法 if len(sample_data['data']) == 0: # If not, exit the loop break 但是在API为空后退出循环都不起作用，有什么帮助或建议吗？再次感谢您！
我无法运行它，所以我看不到你在response 中真正得到了什么。首先你可以使用print() 来查看你得到了什么，也许page_data['data'] 不是空的——也许你需要不同的方法来识别数据的结尾。