【问题标题】:ValueError: 9 columns passed, passed data had 3 columnsValueError:通过了 9 列,传递的数据有 3 列
【发布时间】:2021-05-14 03:13:37
【问题描述】:

我刚开始学习网络抓取,30 分钟后我在从 wiki 抓取表格时遇到了问题。

import requests
from bs4 import BeautifulSoup
import pandas as pd

start_url = 'https://en.wikipedia.org/wiki/The_Avengers_(2012_film)#Sequels'

downloaded_html = requests.get(start_url)

soup = BeautifulSoup(downloaded_html.text)

with open('downloaded.html', 'w', encoding="utf-8") as file:
file.write(soup.prettify())

full_table = soup.select('table.wikitable tbody')[0]

table_head = full_table.select('tr th')

tabele_column = []
for element in table_head:
    colume_label = element.get_text(separator=" ", strip=True)
    colume_label = colume_label.replace(" ", "_")
    tabele_column.append(colume_label)

table_row = full_table.select('tr')
table_data = []
for index, element in enumerate(table_row):
    if index > 0:
        row_list = []
        values = element.select('td')
        for value in values:
            row_list.append(value.text.strip())
        table_data.append(row_list)
# print(table_data)

df = pd.DataFrame(table_data, columns=colume_label)
print(df)

我收到以下错误

ValueError: 9 列传递,传递的数据有 3 列

【问题讨论】:

    标签: python pandas web-scraping


    【解决方案1】:

    我怀疑你使用colume_label 而不是tabele_column 来构建dataframe

    df = pd.DataFrame(table_data, columns=tabele_column)
    print(df)
    #                                          Record_title                          Record_detail   Reference
    # 0                        Opening weekend for any film                           $207,438,708       [212]
    # 1                           Opening week for any film                           $270,019,373       [213]
    # 2        Opening weekend, adjusted for ticket pricing                         $207.4 million       [214]
    # 3                      Theater average – wide release                                $47,698       [206]
    # 4                     3D gross during opening weekend                           $108 million  [198][203]
    # 5                   IMAX gross during opening weekend                          $15.3 million       [200]
    # 6                         Second weekend for any film                           $103,052,274       [215]
    # 7                  Monthly share of domestic earnings                          May 2012, 52%       [211]
    # 8                            Highest cumulative gross                            2 – 43 days       [216]
    # 9                   Days to reach $100*, $150 million                                2 days*       [217]
    # 10  Days to reach $200, $250, $300, $350, $400, $4...  3, 6, 9, 10, 14, 17 days respectively       [217]
    # 11                   Days to reach $500, $550 million                            23, 31 days  [208][217]
    # 12                                        May opening                           $207,438,708       [218]
    # 13               Opening weekend for a superhero film                           $207,438,708       [219]
    # 14                    Highest-grossing superhero film                           $623,357,910       [220]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-08-20
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多