【问题标题】:JSONdata parse using beautifulsoup and output errorsJSONdata使用beautifulsoup解析并输出错误
【发布时间】:2019-10-24 10:09:12
【问题描述】:

当我运行以下代码时会产生以下错误:

import requests
import json
from bs4 import BeautifulSoup

JSONDATA = requests.request("GET", "https://thisiscriminal.com/wp-json/criminal/v1/episodes?posts=1000000&page=1")
JSONDATA = JSONDATA.json()

for line in JSONDATA['posts']:
    soup = BeautifulSoup(line['episodeNumber'])
    soup = BeautifulSoup(line['title'])
    soup = BeautifulSoup(line['audioSource'])
    soup = BeautifulSoup(line['large'])
    soup = BeautifulSoup(line['long'])
    print soup.prettify()

产生了以下错误(我已经尝试了各种关于它建议的关于 LXML 的变体):

  • LXML 问题
  • 关于不喜欢 .mp3 链接的问题,但这应该不是问题,因为此链接是正确的?
  • 在查找“大”缩略图时遇到问题,但使用标题、audioSource 等的等效字段不会产生相同的错误,但查看网站数据是正确的框?

输出错误

python ./test2.py
./test2.py:14: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 14 of the file ./test2.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor.

  soup = BeautifulSoup("features=lxml")(line['episodeNumber'])
./test2.py:16: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 16 of the file     ./test2.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor.

  soup = BeautifulSoup(line['title'])
./test2.py:18: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 18 of the file ./test2.py. To get rid of this warning, pass the additional argument 'features="lxml"' to the BeautifulSoup constructor.

  soup = BeautifulSoup(line['audioSource'])

/home/leo/.local/lib/python2.7/site-packages/bs4/init.py:335:
用户警告:
https://dts.podtrac.com/redirect.mp3/dovetail.prxu.org/criminal/85cd4e4d-fa8b-4df2-8a8c-78ad0e800574/Episode_116_190504_audition_mix_neg18_part_1.mp3”看起来像一个 URL。 Beautiful Soup 不是 HTTP 客户端。您可能应该使用类似请求的 HTTP 客户端来获取 URL 后面的文档,并将该文档提供给 Beautiful Soup。 '那份给美丽汤的文件。 % 解码标记 回溯(最近一次通话最后): 文件“./test2.py”,第 20 行,在 汤 = BeautifulSoup(line['large']) KeyError:“大”

【问题讨论】:

    标签: python json python-requests


    【解决方案1】:

    如果您只是尝试获取 json 中的数据,这将起作用。

    import pandas as pd
    
    import requests
    import json
    from bs4 import BeautifulSoup
    
    JSONDATA = requests.request("GET", "https://thisiscriminal.com/wp-json/criminal/v1/episodes?posts=1000000&page=1")
    JSONDATA = JSONDATA.json()
    
    #loads the Json in a dataframe
    df = pd.io.json.json_normalize(JSONDATA['posts'])
    df.to_csv('posts.csv')
    

    lxml 问题将通过以下方式解决: BeautifulSoup(line['episodeNumber'],'lxml') 这是因为 BeautifulSoup 需要一个 html 解析器来制作一个汤对象。 如果你没有 lxml 做。

    pip install lxml
    

    第二个警告是关于您传递一个 url 来创建不起作用的汤对象,因为正如警告所说,它不知道如何下载链接。

    最后你的最后一个错误是由于链接 json 没有名为“大”的键

    你需要一个异常块。

    【讨论】:

    • 非常感谢。这似乎解决了 LXML 问题。我认为 .mp3 链接没问题,因为我不希望 python 播放或执行媒体文件,只需将其传递给 Kodi 媒体中心即可。至于图像,查看在线 json 查看器 - 图像和描述(即大而长)比“标题”设置的字符多一些。我将如何在代码中引用它?
    • large 和 long 嵌套在 'image' dict中,所以你要做,line['image']['large'],
    • 只是另一个想法:您导出到 .CSV 的建议。我收到以下错误:文件“pandas/_libs/writers.pyx”,第 55 行,在 pandas._libs.writers.write_csv_rows UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 30: ordinal not in range(128) 当我以这种方式解析同一个网页并且与其他项目具有相同的依赖关系时,这个字符会导致问题吗?
    • 试试 df.to_csv('posts.csv',encoding='utf-8'),您使用的是 Python 2?,如果答案解决了您的问题,请标记为
    猜你喜欢
    • 2012-10-17
    • 2018-02-01
    • 1970-01-01
    • 2016-09-02
    • 1970-01-01
    • 1970-01-01
    • 2015-07-09
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多