使用python html错误抓取网络数据答案

【问题标题】：crawling web data using python html error使用python html错误抓取网络数据
【发布时间】：2016-11-02 09:02:06
【问题描述】：

我想使用 python 抓取数据我又试了试但它没有用我找不到代码的错误我写了这样的代码：

import re
import requests
from bs4 import BeautifulSoup

url='http://news.naver.com/main/ranking/read.nhn?mid=etc&sid1=111&rankingType=popular_week&oid=277&aid=0003773756&date=20160622&type=1&rankingSectionId=102&rankingSeq=1'
html=requests.get(url)
#print(html.text)
a=html.text
bs=BeautifulSoup(a,'html.parser')
print(bs)
print(bs.find('span',attrs={"class" : "u_cbox_contents"}))

我想抓取新闻中的回复数据

如你所见，我试图烧掉这个：

span, class="u_cbox_contents" in bs

但是python只说“无”

无

所以我使用函数 print(bs) 检查 bs

我检查了 bs 变量的内容

但是没有跨度，class="u_cbox_contents"

为什么会这样？

我真的不知道为什么

请帮帮我

感谢阅读。

【问题讨论】：

标签： python web beautifulsoup web-crawler

【解决方案1】：

请求将获取 URL 的内容，但不会执行任何 JavaScript。

我使用 cURL 执行了相同的提取，但我在 HTML 代码中找不到任何出现的 u_cbox_contents。很可能，它是使用 JavaScript 注入的，这就解释了为什么 BeautifulSoup 找不到它。

如果您需要页面的代码，因为它会在“普通”浏览器中呈现，您可以尝试Selenium。也看看this SO question。

【讨论】：