【发布时间】:2020-07-26 18:57:07
【问题描述】:
我创建了一个脚本来解析来自this webpage 的movie name、all cast、Produced by 和Casting By。我可以从该页面解析上述字段。但是,当考虑到这四个项目时,我不能做的是以某种定制的方式排列和打印项目。当我只包含movie name 和cast 时,到目前为止我编写的脚本可以完全按照我想要的方式打印项目。我希望包括Produced by 和Casting By 以及您在this image 中看到的内容。
到目前为止我已经尝试过:
import requests
from bs4 import BeautifulSoup
link = 'https://www.imdb.com/title/tt0068646/fullcredits?ref_=tt_cl_sm#cast'
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'
r = s.get(link)
soup = BeautifulSoup(r.text,"lxml")
movie_name = soup.select_one("h3[itemprop='name'] > a").get_text(strip=True)
for item in soup.select("h4#cast + table.cast_list tr:has(:not(.castlist_label))"):
performer = item.select_one("td:not(.primary_photo) > a[href^='/name/']").get_text(strip=True)
character = ' '.join(item.select_one("td.character").text.split())
print(movie_name,performer,character)
我得到的输出(movie name 和 cast):
The Godfather Marlon Brando Don Vito Corleone
The Godfather Al Pacino Michael Corleone
The Godfather James Caan Sonny Corleone
The Godfather Richard S. Castellano Clemenza (as Richard Castellano)
The Godfather Robert Duvall Tom Hagen
The Godfather Sterling Hayden Capt. McCluskey
The Godfather John Marley Jack Woltz
and so on----------------------
我希望在上面打印的底部添加以下结果(取自您在图像中看到的两个字段 Produced by 和 Casting By):
The Godfather Gray Frederickson associate producer
The Godfather Al Ruddy producer (as Albert S. Ruddy) (produced by)
The Godfather Robert Evans studio executive (uncredited)
The Godfather Louis DiGiaimo (casting)
The Godfather Andrea Eastman (casting)
The Godfather Fred Roos (casting)
如何让脚本按照我上面显示的方式打印字段?
【问题讨论】:
标签: python python-3.x web-scraping beautifulsoup