【问题标题】:Python taking specific data's from websitesPython 从网站获取特定数据
【发布时间】:2020-11-15 21:31:37
【问题描述】:

我是 python 新手,我正在研究界面。我应该从 imdb 网站获取前 250 部电影。

def clicked(self):
    movie=self.movie_name.text()
    
    url="https://www.imdb.com/chart/top/"
    response=requests.get(url)
    html_content=response.content
    soup=BeautifulSoup(html_content,"html.parser")

    movie_name = soup.find_all("td",{"class":"titleColumn"})
    for i in movie_name:
        i=i.text

        i=i.strip()

        i=i.replace("\n","")

        if (movie == i):
            self.yazialani.setText(i) 

使用此代码输出如下: 6. 辛德勒的名单(1993) 7. 指环王:王者归来(2003) 8. 低俗小说(1994) 但对于我的项目,我只想取电影名称而不是年份和排名。我应该如何更改我的代码?

【问题讨论】:

    标签: python html beautifulsoup request


    【解决方案1】:

    只有电影的名称包含在锚标记中。所以为每个td选择锚标签文本

    import requests
    from bs4 import BeautifulSoup
    
    url="https://www.imdb.com/chart/top/"
    response=requests.get(url)
    html_content=response.content
    soup=BeautifulSoup(html_content,"html.parser")
    
    movie_name = soup.find_all("td",{"class":"titleColumn"})
    
    for i in movie_name:
        print(i.find("a").get_text(strip=True))
    

    输出:

    The Shawshank Redemption
    The Godfather
    The Godfather: Part II
    The Dark Knight
    12 Angry Men
    Schindler's List
    The Lord of the Rings: The Return of the King
    Pulp Fiction
    Il buono, il brutto, il cattivo
    The Lord of the Rings: The Fellowship of the Ring
    Fight Club
    Forrest Gump
    Inception
    Star Wars: Episode V - The Empire Strikes Back
    The Lord of the Rings: The Two Towers
    The Matrix
    Goodfellas
    One Flew Over the Cuckoo's Nest
    Shichinin no samurai
    Se7en
    La vita è bella
    Cidade de Deus
    The Silence of the Lambs
    Hamilton
    It's a Wonderful Life
    Star Wars
    Saving Private Ryan
    Sen to Chihiro no kamikakushi
    Gisaengchung
    The Green Mile
    Interstellar
    Léon
    The Usual Suspects
    Seppuku
    The Lion King
    Back to the Future
    The Pianist
    Terminator 2: Judgment Day
    American History X
    Modern Times
    Psycho
    Gladiator
    City Lights
    The Departed
    The Intouchables
    Whiplash
    The Prestige
    ...
    ...
    ..
    

    【讨论】:

      【解决方案2】:

      一个原始的解决方案可能是(考虑到你的字符串是 digits+. +name_of_movie+(YEAR) 的小费只是

      a=["6. Schindler's List(1993)", "7. The Lord of the Rings: The Return of the King(2003)", "8. Pulp Fiction(1994)"]
      just_names=[]
      for name in a:
          i=0
          while True:
              if name[i]=='.':
                  just_names.append(name[i+2:-6]) # To delete the space after the point
                  break
              i+=1
      

      【讨论】:

        猜你喜欢
        • 2021-08-04
        • 1970-01-01
        • 2023-01-01
        • 1970-01-01
        • 1970-01-01
        • 2020-08-05
        • 1970-01-01
        • 2021-08-14
        • 1970-01-01
        相关资源
        最近更新 更多