【发布时间】:2021-04-04 12:17:06
【问题描述】:
我有这个代码,它从 imdb 中删除了这些数据:前 250 部电影、字段名称、年份和评级.. 我试图弄清楚如何只提取布拉德皮特所在的电影,我已经搜索了很多类似的问题,但没有一个真正有帮助,感谢您的任何贡献!
import re
import requests
from bs4 import BeautifulSoup
url = 'http://www.imdb.com/chart/top'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
movies = soup.select('td.titleColumn')
links = [a.attrs.get('href') for a in soup.select('td.titleColumn a')]
crew = [a.attrs.get('title') for a in soup.select('td.titleColumn a')]
ratings = [b.attrs.get('data-value') for b in soup.select('td.posterColumn span[name=ir]')]
votes = [b.attrs.get('data-value') for b in soup.select('td.ratingColumn strong')]
imdb = []
for index in range(0, len(movies)):
movie_string = movies[index].get_text()
movie = (' '.join(movie_string.split()).replace('.', ''))
movie_title = movie[len(str(index)) + 1:-7]
year = re.search('\((.*?)\)', movie_string).group(1)
place = movie[:len(str(index)) - (len(movie))]
data = {"movie_title": movie_title,
"year": year,
"place": place,
"star_cast": crew[index],
"rating": ratings[index],
"vote": votes[index],
"link": links[index]}
imdb.append(data)
for item in imdb:
print(item['place'], '-', item['movie_title'], '(' + item['year'] + ') -', 'Starring:', item['star_cast'])
【问题讨论】:
-
您应该能够使用简单的 if 语句来解决这个问题……您尝试了什么?
-
此外,您可以更改 URI 以查看 onlypit 电影:imdb.com/name/nm0000093/videogallery?ref_=nm_phs_vi
-
寻找代码示例,尝试调试您拥有的代码并查看每一步的结果,并将其与您期望看到的结果进行比较。
-
@PatrickArtner 我是初学者,我不知道该怎么做,谢谢你帮助我!!
标签: python html web-scraping beautifulsoup request