有没有办法从扮演多个角色的演员那里检索剧集计数和日期？答案

【问题标题】：Is there a way to retrieve an episode count & dates from an actor that played multiple roles?有没有办法从扮演多个角色的演员那里检索剧集计数和日期？
【发布时间】：2021-05-06 18:59:16
【问题描述】：

为了更清楚：我想检索一个演员在 IMDB 中出现的次数（包括日期）。

I'm using the Doctor Who page as an example

在这种情况下，我想知道马特·史密斯从 2010 年到 2020 年出现在 46 集中。

IMDB 在角色对象上完美地做到了这一点，具有 currentRole 和它的 notes 属性

from imdb import IMDb

ia = IMDb()
movie = ia.get_movie('0436992') # id for Doctor Who
cast = movie['cast']
print("Actor name :", cast[0]['name'])
print("Role :", cast[0].currentRole)
print("Notes :", cast[0].notes)

显示

Actor name : Matt Smith
Role : The Doctor
Notes : (58 episodes, 2010-2020)

（奇怪的是，集数是错误的，因为网站上写了 46 集，如果你点击它会显示 54 集，但这不是我的意思）

但是，其他演员在这个系列中扮演了多个角色，Character.currentRole 然后返回一个列表。我更改了我的代码以正确获取它：


from imdb import IMDb

ia = IMDb()
movie = ia.get_movie('0436992')
cast = movie['cast']

for i in range(2):

    print("Actor name :", cast[i]['name'])

    if isinstance(cast[i].currentRole, list):
        print("Roles :")
        for role in cast[i].currentRole:
            print(" - ", role, " (Note :" + role.notes + ")")

    else:
        print("Role :", cast[i].currentRole)
    print("Notes :", cast[i].notes)
    print("")

但结果是：

Actor name : Matt Smith
Role : The Doctor
Notes : (58 episodes, 2010-2020)

Actor name : David Tennant
Roles :
 -  The Doctor  (Note :)
 -  ...  (Note :)
Notes :

我无法在此处检索我想要的信息，并且所有“注释”都是空的。我在调试时尝试从 imdbpy 中挖掘 Person 和 Character 对象，但找不到我需要的。

它似乎只发生在扮演多个角色的演员身上，有没有办法用 imdbpy 而不是外部解析器来检索它？

感谢任何想法

【问题讨论】：

标签： python imdb imdbpy

【解决方案1】：

我遇到了同样的问题。可悲的是，我也无法用 IMDbPY 解决它。我认为这是错误的。相反，我用 bs4 编写了自己的解析器：

import requests
from bs4 import BeautifulSoup

# parse the page with bs4
page = requests.get('https://www.imdb.com/title/tt0436992/fullcredits')
soup = BeautifulSoup(page.text, 'lxml')

# find the cast table
table = soup.find('table', {"class": "cast_list"})

cast = []

# iterate over it
for row in table.find_all('tr'):
    column_marker = 0
    columns = row.find_all('td')
    cast_member = {}
    for column in columns:
        # name column
        if column_marker == 1:
            cast_member['name'] = column.get_text().strip()
        # combined role and episodes/years column
        elif column_marker == 3:
            links = column.find_all('a')
            role_element = column.find('a', {'class': None})
            if role_element:
                cast_member['role'] = role_element.get_text().strip()
            episodes_and_years_element = column.find('a', {'class': 'toggle-episodes'})
            if episodes_and_years_element:
                episodes_and_years = episodes_and_years_element.get_text().strip().split(', ')
                cast_member['episodes'] = episodes_and_years[0]
                if len(episodes_and_years) > 1:
                    cast_member['years'] = episodes_and_years[1]
        column_marker += 1
    if len(cast_member):
        cast.append(cast_member)

print(cast[:5])

这绝对不是最优雅的解决方案，但我相信它可以满足您的需求。

【讨论】：

是的，它被窃听了。我们和朋友一起研究了 imdbpy 存储库；它基于一个旧的 IMDB 版本，并且解析做得不好，没有比自己解析或 PR 修复存储库更好的解决方案