【发布时间】:2020-07-22 19:20:39
【问题描述】:
我试图从网页中抓取某些信息,但失败了。我希望抓取的文本在页面源中可用,但我仍然无法获取它。这是site address。我在图像中可见的部分之后是Not Rated。
相关html:
<div class="subtext">
Not Rated
<span class="ghost">|</span> <time datetime="PT188M">
3h 8min
</time>
<span class="ghost">|</span>
<a href="/search/title?genres=drama&explore=title_type,genres&ref_=tt_ov_inf">Drama</a>,
<a href="/search/title?genres=musical&explore=title_type,genres&ref_=tt_ov_inf">Musical</a>,
<a href="/search/title?genres=romance&explore=title_type,genres&ref_=tt_ov_inf">Romance</a>
<span class="ghost">|</span>
<a href="/title/tt0150992/releaseinfo?ref_=tt_ov_inf" title="See more release dates">18 June 1999 (India)
</a> </div>
我试过了:
import requests
from bs4 import BeautifulSoup
link = "https://www.imdb.com/title/tt0150992/?ref_=ttfc_fc_tt"
with requests.Session() as s:
s.headers['User-Agent'] = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36"
r = s.get(link)
soup = BeautifulSoup(r.text,"lxml")
rating = soup.select_one(".titleBar .subtext").next_element
print(rating)
我使用上面的脚本没有得到任何结果。
预期输出:
Not Rated
如何从该网页获得评分?
【问题讨论】:
标签: python python-3.x web-scraping beautifulsoup python-requests