如何遍历具有多个页面的json答案

【问题标题】：how to iterate through a json that has multiple pages如何遍历具有多个页面的json
【发布时间】：2016-09-22 11:25:15
【问题描述】：

我创建了一个遍历多页 json 对象的程序。

def get_orgs(token,url):
    part1 = 'curl -i -k -X GET -H "Content-Type:application/json" -H "Authorization:Bearer '
    final_url = part1 + token + '" ' + url 
    pipe = subprocess.Popen(final_url, shell=False,stdout=subprocess.PIPE,stdin=subprocess.PIPE)
    data = pipe.communicate()[0]
    for line in data.split('\n'):
        print line
        try:
            row = json.loads(line)
            print ("next page url ",row['next'])
        except :
            pass
    return row
my_data = get_orgs(u'MyBeearerToken',"https://data.ratings.com/v1.0/org/576/portfolios/36/companies/")

json对象如下：

[{results: [{"liquidity":"Strong","earningsPerformance":"Average"}]
,"next":"https://data.ratings.com/v1.0/org/576/portfolios/36/companies/?page=2"}]

我正在使用“下一个”键进行迭代，但有时它指向“无效页面”（一个不存在的页面）。 JSON 对象是否有关于每页上有多少条记录的规则？在这种情况下，我将使用它来估计可能有多少页。

编辑：添加更多细节 json 只有 2 个键 ['results','next']。如果有多个页面，则“下一页”键具有下一页的 url (as you can see in the output above)。否则，它包含“无”。但是，问题是有时它指向下一页（不存在），而不是 'None' 。所以，我想看看我是否可以计算 Json 中的行数并除以一个数字，以了解循环需要遍历多少页。

【问题讨论】：

对我来说，不清楚您要达到什么目的。您的问题似乎是您从服务器请求一些 JSON。由于缺少更好的词，JSON 包含指向下一个数据集的 URL。您在提取正确的 URL 时遇到问题，还是从响应中提取的 URL 不正确？在后一种情况下，问题不在您的代码中。为什么你使用 curl 而不是像 urllib.request 这样的内置 python 解决方案？
您好莫里斯，感谢您的回复。我坐在公司的代理后面，curl 工作正常。对于 urllib2 或请求，我收到身份验证错误。
@Maurice，我已编辑问题以提供有关该问题的更多详细信息

标签： python json

【解决方案1】：

在我看来，使用 urllib2 或 urllib.request 将是比 curl 更好的选择，以使代码更易于理解，但如果这是一个约束 - 我可以使用它;-)

假设 json-response 都在一行中（否则你的 json.loads 将抛出一个异常），任务非常简单，这将允许你获取结果键后面的项目数量：

row = [{'next': 'https://data.ratings.com/v1.0/org/576/portfolios/36/companies/?page=2', 'results': [{'earningsPerformance':'Average','liquidity': 'Strong'}, {'earningsPerformance':'Average','liquidity': 'Strong'}]}]
result_count = len(row[0]["results"])

使用 httplib2 的替代解决方案应该是这样的（我没有测试过）：

import httplib2
import json
h = httplib2.Http('.cache')
url = "https://data.ratings.com/v1.0/org/576/portfolios/36/companies/"
token = "Your_token"
try:
    response, content = h.request(
        url,
        headers = {'Content-Type': 'application/json', 'Authorization:Bearer': token}
    )
    # Convert the response to a string
    content = content.decode('utf-8') # You could get the charset from the header as well
    try:
        object = json.loads(content)
        result_count = len(object[0]["results"])
        # Yay, we got the result count!
    except Exception:
        # Do something if the server responds with garbage
        pass
except httplib2.HttpLib2Error:
    # Handle the exceptions, here's a list: https://httplib2.readthedocs.io/en/latest/libhttplib2.html#httplib2.HttpLib2Error
    pass

有关 httplib2 的更多信息以及它为何如此神奇，我建议阅读 Dive Into Python。

【讨论】：