Python 从网站获取特定数据答案

【问题标题】：Python taking specific data's from websitesPython 从网站获取特定数据
【发布时间】：2020-11-15 21:31:37
【问题描述】：

我是 python 新手，我正在研究界面。我应该从 imdb 网站获取前 250 部电影。

def clicked(self):
    movie=self.movie_name.text()
    
    url="https://www.imdb.com/chart/top/"
    response=requests.get(url)
    html_content=response.content
    soup=BeautifulSoup(html_content,"html.parser")

    movie_name = soup.find_all("td",{"class":"titleColumn"})
    for i in movie_name:
        i=i.text

        i=i.strip()

        i=i.replace("\n","")

        if (movie == i):
            self.yazialani.setText(i)

使用此代码输出如下： 6. 辛德勒的名单（1993） 7. 指环王：王者归来（2003） 8. 低俗小说（1994）但对于我的项目，我只想取电影名称而不是年份和排名。我应该如何更改我的代码？

【问题讨论】：

标签： python html beautifulsoup request

【解决方案1】：

只有电影的名称包含在锚标记中。所以为每个td选择锚标签文本

import requests
from bs4 import BeautifulSoup

url="https://www.imdb.com/chart/top/"
response=requests.get(url)
html_content=response.content
soup=BeautifulSoup(html_content,"html.parser")

movie_name = soup.find_all("td",{"class":"titleColumn"})

for i in movie_name:
    print(i.find("a").get_text(strip=True))

输出：

The Shawshank Redemption
The Godfather
The Godfather: Part II
The Dark Knight
12 Angry Men
Schindler's List
The Lord of the Rings: The Return of the King
Pulp Fiction
Il buono, il brutto, il cattivo
The Lord of the Rings: The Fellowship of the Ring
Fight Club
Forrest Gump
Inception
Star Wars: Episode V - The Empire Strikes Back
The Lord of the Rings: The Two Towers
The Matrix
Goodfellas
One Flew Over the Cuckoo's Nest
Shichinin no samurai
Se7en
La vita è bella
Cidade de Deus
The Silence of the Lambs
Hamilton
It's a Wonderful Life
Star Wars
Saving Private Ryan
Sen to Chihiro no kamikakushi
Gisaengchung
The Green Mile
Interstellar
Léon
The Usual Suspects
Seppuku
The Lion King
Back to the Future
The Pianist
Terminator 2: Judgment Day
American History X
Modern Times
Psycho
Gladiator
City Lights
The Departed
The Intouchables
Whiplash
The Prestige
...
...
..

【讨论】：

【解决方案2】：

一个原始的解决方案可能是（考虑到你的字符串是 digits+. +name_of_movie+(YEAR) 的小费只是

a=["6. Schindler's List(1993)", "7. The Lord of the Rings: The Return of the King(2003)", "8. Pulp Fiction(1994)"]
just_names=[]
for name in a:
    i=0
    while True:
        if name[i]=='.':
            just_names.append(name[i+2:-6]) # To delete the space after the point
            break
        i+=1

【讨论】：