【问题标题】:rich.table Returning Incorrect Number of Resultsrich.table 返回不正确的结果数
【发布时间】:2021-04-26 23:11:40
【问题描述】:

我正在按年抓取电影信息。当我尝试 print 语句时,它会打印所有 100 部电影,但是当我使用rich.table print 时,我只会得到第一部电影。

import requests
from bs4 import BeautifulSoup
from rich.table import Table
from rich.console import Console

table = Table()

url = 'https://www.rottentomatoes.com/top/bestofrt/?year='

year = input('Top 100 Movies for Which Year? ')
response = requests.get(url + year)
html = response.text

soup = BeautifulSoup(html, 'lxml')
containers = soup.find_all('table', class_='table')

for container in containers:
    for row in container.find_all('tr')[1:]:
        movie_rank = row.find('td', class_='bold')
        movie_rank = movie_rank.text

        movie_name = row.find('a', class_='unstyled articleLink')
        movie_name = movie_name.text.strip()
        movie_name = movie_name.strip('(' + year + ')')

        movie_rating = row.find('span', class_='tMeterScore')
        movie_rating = movie_rating.text

        # print(f'{movie_rank} {movie_name.strip()} - rating:{movie_rating}')
        table.add_column('Rank')
        table.add_column('Movie')
        table.add_column('Rating')
        # problem is here     
        table.add_row(movie_rank, movie_name, movie_rating)
       
        console = Console()
        console.print(table)
        break

【问题讨论】:

  • 问题不在你说的地方。最后三行应该向左移动两个制表符,因此它们不是循环的一部分。现在,您在一个容器中执行一行,然后打印结果并退出循环。将这三行移到与for container in containers: 行对齐。
  • console.print(table) 之后有一个break --- 在一部电影后结束循环。
  • 谢谢你,但还是不行。我完全按照你说的做了,把最后三行与for循环对齐。
  • elliott,我之前试过,但结果一样...
  • 你以前试过什么?删除break。这会立即终止您的循环。

标签: python web-scraping rich


【解决方案1】:

您在循环一次迭代后立即终止循环,您应该在构建表格后打印一次。此外,您应该添加一次列(而不是每次迭代)。喜欢,

import requests
from bs4 import BeautifulSoup
from rich.table import Table
from rich.console import Console

table = Table()
table.add_column('Rank')
table.add_column('Movie')
table.add_column('Rating')

url = 'https://www.rottentomatoes.com/top/bestofrt/?year='

year = input('Top 100 Movies for Which Year? ')
response = requests.get(url + year)
html = response.text

soup = BeautifulSoup(html, 'lxml')
containers = soup.find_all('table', class_='table')

for container in containers:
    for row in container.find_all('tr')[1:]:
        movie_rank = row.find('td', class_='bold')
        movie_rank = movie_rank.text

        movie_name = row.find('a', class_='unstyled articleLink')
        movie_name = movie_name.text.strip()
        movie_name = movie_name.strip('(' + year + ')')

        movie_rating = row.find('span', class_='tMeterScore')
        movie_rating = movie_rating.text

        table.add_row(movie_rank, movie_name, movie_rating)

console = Console()
console.print(table)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2012-04-11
    • 2016-02-06
    • 2017-05-09
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多