【发布时间】:2020-07-05 20:12:15
【问题描述】:
您好,我正在尝试使用 python beautiful-soup 网络爬虫从 imdb 获取数据,我已按照在线文档进行操作,能够使用此代码检索所有数据
from requests import get
from bs4 import BeautifulSoup
url = 'https://www.imdb.com/title/tt1405406/episodes?season=1&ref_=tt_eps_sn_1'
response = get(url)
html_soup = BeautifulSoup(response.text, 'html.parser')
type(html_soup)
movie_containers = html_soup.find_all('div', class_ = 'image')
print(movie_containers)
使用上面的代码,我可以检索到标记为图像的 div 类中所有数据的列表,如下所示
<div class="image">
<a href="/title/tt1486497/" itemprop="url" title="Pilot"> <div class="hover-over-image zero-z-index" data-const="tt1486497">
<img alt="Pilot" class="zero-z-index" height="126" src="https://m.media-amazon.com/images/M/MV5BNTExMDIwNTUyNF5BMl5BanBnXkFtZTcwNzU5MDg1Mg@@._V1_UX224_CR0,0,224,126_AL_.jpg" width="224"/>
<div>S1, Ep1</div>
</div>
</a> </div>
<div class="image">
<a href="/title/tt1485650/" itemprop="url" title="The Night of the Comet"> <div class="hover-over-image zero-z-index" data-const="tt1485650">
<img alt="The Night of the Comet" class="zero-z-index" height="126" src="https://m.media-amazon.com/images/M/MV5BMjIyNDczNDYzNV5BMl5BanBnXkFtZTcwNDk1MDQ4Mg@@._V1_UX224_CR0,0,224,126_AL_.jpg" width="224"/>
<div>S1, Ep2</div>
</div>
</a> </div>
但我试图从结果中获取属性 data-const 的值,我只想显示 data-const 属性的值而不是整个 html 结果预期结果:tt1486497, tt1485650
【问题讨论】:
标签: python beautifulsoup pycharm web-crawler