【问题标题】:python ijson large file loop to get namespython ijson大文件循环获取名称
【发布时间】:2016-08-16 20:20:43
【问题描述】:

我需要用 ijson 解析一个大的 json 文件(除非有更好的方法),我想遍历请求中的所有产品名称并将它们打印出来。我尝试使用此支持页面进行设置。 https://pypi.python.org/pypi/ijson/

这是我得到的当前输出

<addinfourl at 140643118020800 whose fp = <socket._fileobject object at 0x7fea07882850>>
<generator object items at 0x7fea077dc910>
<generator object <genexpr> at 0x7fea077dc960>

我的代码

import json
import requests
import lxml 
import ijson
import urllib
from urllib import urlopen


request = urlopen('www.jsonurl.com')
objects = ijson.items(request, 'items.name')
products = (o for o in objects if o ['type' == 'name'])
for product in products:
    print product

print request
print objects
print products

这是一段json数据

{"query":"*","sort":"relevance","responseGroup":"base","totalResults":5158058,"start":1,"numItems":10,"items":[{"itemId":7933617,"parentItemId":7933617,"name":"Nordic Ware Heavyweight Scone / Cornbread Pan","msrp":26.97,"salePrice":20.42,"upc":"011172016409","categoryPath":"Home/Kitchen & Dining/Cookware, Bakeware & Tools/Specialty Cookware","shortDescription":"&lt;p&gt;This Nordic Ware Scone Pan is made of a heavyweight cast aluminum. It can be used as a heavyweight scone or cornbread pan, and it is designed to cook your meal evenly and thoroughly. It features a non-stick interior coating for easy release and clean up.&lt;/p&gt;","longDescription":"&lt;b&gt;Nordic Ware Heavyweight Scone/Cornbread Pan:&lt;/b&gt;&lt;ul&gt;&lt;li&gt;Heavyweight cast aluminum&lt;/li&gt;&lt;li&gt;Ideal for scones and cornbread&lt;/li&gt;&lt;li&gt;Eight wedges&lt;/li&gt;&lt;li&gt;Cooks evenly and thoroughly&lt;/li&gt;&lt;li&gt;Non-stick interior coating for easy release and clean-up&lt;/li&gt;&lt;/ul&gt;","thumbnailImage":"http://i5.walmartimages.com/dfw/dce07b8c-c739/k2-_6fb32a28-c090-4377-81d5-e83273124841.v1.jpg","mediumImage":"http://i5.walmartimages.com/dfw/dce07b8c-ddb3/k2-_6f7df9fa-cb2d-4faf-afbc-8fa4185add59.v1.jpg","largeImage":"http://i5.walmartimages.com/dfw/dce07b8c-5bd3/k2-_6635f62a-5e0b-4c4e-a93d-ee85643f7397.v1.jpg","productTrackingUrl":"http://linksynergy.walmart.com/fs-bin/click?id=|LSNID|&offerid=223073.7200&type=14&catid=8&subid=0&hid=7200&tmpid=1082&RD_PARM1=http%253A%252F%252Fwww.walmart.com%252Fip%252FNordicWare-Heavyweight-Scone-Cornbread-Pan%252F7933617%253Faffp1%253DpjiPu5Y7cvNmz4xZOAs5j7QlW2mZPVmc1DR3BvmrkB4%2526affilsrc%253Dapi","standardShipRate":4.97,"marketplace":false,"modelNumber":"1640","productUrl":"http://c.affil.walmart.com/t/api02?l=http%3A%2F%2Fwww.walmart.com%2Fip%2FNordicWare-Heavyweight-Scone-Cornbread-Pan%2F7933617%3Faffp1%3DpjiPu5Y7cvNmz4xZOAs5j7QlW2mZPVmc1DR3BvmrkB4%26affilsrc%3Dapi%26veh%3Daff%26wmlspartner%3Dreadonlyapi","customerRating":"4.7","numReviews":20,"customerRatingImage":"http://i2.walmartimages.com/i/CustRating/4_7.gif","categoryNode":"4044_623679_133020","bundle":false,"stock":"Available","addToCartUrl":"http://c.affil.walmart.com/t/api02?l=http%3A%2F%2Faffil.walmart.com%2Fcart%2FaddToCart%3Fitems%3D7933617%7C1%26affp1%3DpjiPu5Y7cvNmz4xZOAs5j7QlW2mZPVmc1DR3BvmrkB4%26affilsrc%3Dapi%26veh%3Daff%26wmlspartner%3Dreadonlyapi","affiliateAddToCartUrl":"http://linksynergy.walmart.com/fs-bin/click?id=|LSNID|&offerid=223073.7200&type=14&catid=8&subid=0&hid=7200&tmpid=1082&RD_PARM1=http%253A%252F%252Faffil.walmart.com%252Fcart%252FaddToCart%253Fitems%253D7933617%257C1%2526affp1%253DpjiPu5Y7cvNmz4xZOAs5j7QlW2mZPVmc1DR3BvmrkB4%2526affilsrc%253Dapi","giftOptions":

【问题讨论】:

    标签: python json


    【解决方案1】:

    您在我们的输出中看到的是:

    print request: 打开到一个 url 的连接 - 这似乎是正确的,并不奇怪

    print objects: 如输出所示,它是一个生成器,您可能希望列出 价值观。但是由于对象实际上是一个生成器(您通过使用ijson 要求这样做),您应该 消耗它的价值。通常你是通过list(objects)来做的

    print products: 也是一个生成器,但这次是列表理解的结果。正如你使用的() 在表达式周围,您要求使用生成器。如果您使用[o for o in objects if o ['type' == 'name']],您将直接获得该列表。解决方案与objects 一样:使用 值,例如list(products).

    请注意,一旦您从生成器中消耗了一个值(或所有值),它们就会消失为 生成器保持其私有的内部状态,每次调用都会改变。

    有关更多信息,请参阅 SO 问题 Convert generator object to list for debugging

    【讨论】:

    • 我不完全明白我需要做什么。使用示例代码,如果我执行 products = list(object) 它不会打印任何内容,但如果我打印 request 它会打印整个请求。
    • @turtle 如果你有一个生成器,它不包含任何值,它只是准备提供一些东西。通过调用list(generator),您要求generator 检索实际值。有了这些值,您就可以打印它们。打印一个生成器,你只会看到它是某种函数,而不是值。
    • 感谢您修复了它,但现在我收到此错误 File "/home/python/Desktop/ for event, value in basic_events: File "/usr/local/lib/python2.7/dist -packages/ijson/backends/python.py”,第 185 行,在 basic_parse 中获取 parse_value(lexer)中的值:文件“/usr/local/lib/python2.7/dist-packages/ijson/backends/python.py” ,第 108 行,在 parse_value pos 中,符号 = next(lexer) 文件“/usr/local/lib/python2.7/dist-packages/ijson/backends/python.py”,第 25 行,在 Lexer 中 if type(f. read(0)) == bytetype:AttributeError: 'Response' object has no attribute 'read'
    • @turtle02 我建议您先下载文件(例如到临时文件),然后通过 ijson 从该文件中读取。 ijson.items 需要一个类似文件的对象(具有 .read() 方法,这对于通过 HTTP 读取的文件并不总是那么容易。您可能在 url 中缺少“http://”前缀。如果您不解决问题,它最好打开新问题,重点是打开 url 以获取功能性文件类对象。从本地文件开始会更好。
    猜你喜欢
    • 1970-01-01
    • 2012-01-28
    • 2018-08-14
    • 1970-01-01
    • 2023-04-04
    • 2017-02-10
    • 1970-01-01
    • 2017-07-02
    • 2018-11-18
    相关资源
    最近更新 更多