【问题标题】:Cannot export to ".csv" file - pandas.DataFrame无法导出到“.csv”文件 - pandas.DataFrame
【发布时间】:2021-10-01 02:49:24
【问题描述】:

我想就我的 Google Colaboratory Notebook 寻求帮助。错误位于第四个单元格。

背景:
我们正在执行 网络抓取 BTC 的历史数据。

这是我的代码:

第一个单元格 (成功执行)

#importing libaries
from bs4 import BeautifulSoup
import requests
import pandas as pd

第二个单元格 (执行成功)

#sample url
url = "https://www.bitrates.com/coin/BTC/historical-data/USD?period=allData&limit=500"
#request the page
page = requests.get(url)
#creating a soup object and the parser
soup = BeautifulSoup(page.text, 'lxml')

#creating a table body to pass on the soup to find the table
table_body = soup.find('table')
#creating an empty list to store information
row_data = []

#creating a table 
for row in table_body.find_all('tr'):
  col = row.find_all('td')
  col = [ele.text.strip() for ele in col ] # stripping the whitespaces
  row_data.append(col) #append the column

# extracting all data on table entries
df = pd.DataFrame(row_data)
df

第三格 (执行成功)

headers = []
for i in soup.find_all('th'):
  col_name = i.text.strip().lower().replace(" ", "_")
  headers.append(col_name)
headers

第四格 (执行失败)

df = pd.DataFrame(row_data, columns=headers)
df
#into a file 
df.to_csv('/content/file.csv')

错误! :(

AssertionError                            Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/construction.py in _list_to_arrays(data, columns, coerce_float, dtype)
    563     try:
--> 564         columns = _validate_or_indexify_columns(content, columns)
    565         result = _convert_object_array(content, dtype=dtype, coerce_float=coerce_float)
AssertionError: 13 columns passed, passed data had 7 columns

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/construction.py in _list_to_arrays(data, columns, coerce_float, dtype)
    565         result = _convert_object_array(content, dtype=dtype, coerce_float=coerce_float)
    566     except AssertionError as e:
--> 567         raise ValueError(e) from e
    568     return result, columns
    569 

ValueError: 13 columns passed, passed data had 7 columns

【问题讨论】:

    标签: python excel pandas dataframe data-mining


    【解决方案1】:

    要加载表格,您可以使用简单的pd.read_html()。例如:

    import pandas as pd
    
    url = "https://www.bitrates.com/coin/BTC/historical-data/USD?period=allData&limit=500"
    
    df = pd.read_html(url)[0]
    print(df)
    df.to_csv("data.csv")
    

    创建data.csv(来自 LibreOffice 的屏幕截图):


    纠正你的例子:

    # importing libaries
    from bs4 import BeautifulSoup
    import requests
    import pandas as pd
    
    # sample url
    url = "https://www.bitrates.com/coin/BTC/historical-data/USD?period=allData&limit=500"
    # request the page
    page = requests.get(url)
    # creating a soup object and the parser
    soup = BeautifulSoup(page.text, "lxml")
    
    # creating a table body to pass on the soup to find the table
    table_body = soup.find("table")
    # creating an empty list to store information
    row_data = []
    
    # creating a table
    for row in table_body.select("tr:has(td)"):
        col = row.find_all("td")
        col = [ele.text.strip() for ele in col]  # stripping the whitespaces
        row_data.append(col)  # append the column
    
    # extracting all data on table entries
    df = pd.DataFrame(row_data)
    
    headers = []
    for i in table_body.select("th"):
        col_name = i.text.strip().lower().replace(" ", "_")
        headers.append(col_name)
    
    df = pd.DataFrame(row_data, columns=headers)
    print(df)
    df.to_csv("/content/file.csv")
    

    【讨论】:

      【解决方案2】:
      import pandas as pd
      
      
      df = pd.read_json(
          'https://www.bitrates.com/api/node/v1/symbols/USDTUSD/bitrates/series?aggregate=3&period=lastMonth').T['series'].to_dict()['data']
      print(pd.DataFrame(df))
      

      输出:

                              date      open     close  ...        supply  market_volume24  btc_ratio
      0   2021-04-11T06:00:00.000Z  0.999212  0.999114  ...  4.584629e+10     3.146109e+08   0.000016    
      1   2021-04-12T00:00:00.000Z  0.999114  0.999317  ...  4.584629e+10     2.100706e+09   0.000016    
      2   2021-06-04T18:00:00.000Z  0.999317  1.000613  ...  6.447629e+10     7.298208e+08   0.000025    
      3   2021-06-05T12:00:00.000Z  1.000613  1.000328  ...  0.000000e+00     6.502947e+09   0.000025    
      4   2021-06-06T06:00:00.000Z  1.000328  1.000499  ...  6.447629e+10     6.649574e+08   0.000025    
      5   2021-06-07T00:00:00.000Z  1.000499  1.000408  ...  6.447629e+10     8.272473e+09   0.000025    
      6   2021-06-07T18:00:00.000Z  1.000408  1.000338  ...  6.447629e+10     1.090599e+09   0.000025    
      7   2021-06-08T12:00:00.000Z  1.000338  1.000840  ...  6.447177e+10     2.196249e+09   0.000028    
      8   2021-06-09T06:00:00.000Z  1.000840  1.001088  ...  0.000000e+00     1.080053e+10   0.000028    
      9   2021-06-10T00:00:00.000Z  1.001088  1.000618  ...  6.447177e+10     4.158914e+09   0.000026    
      10  2021-06-10T18:00:00.000Z  1.000618  1.000436  ...  6.447177e+10     6.713012e+08   0.000026    
      11  2021-06-11T12:00:00.000Z  1.000436  1.000234  ...  6.447177e+10     4.093096e+09   0.000025    
      12  2021-06-12T06:00:00.000Z  1.000234  1.000385  ...  6.447177e+10     5.042653e+09   0.000026    
      13  2021-06-13T00:00:00.000Z  1.000385  1.000302  ...  0.000000e+00     5.502808e+09   0.000026    
      14  2021-06-13T18:00:00.000Z  1.000302  1.000110  ...  6.447177e+10     1.008952e+10   0.000024    
      15  2021-06-14T12:00:00.000Z  1.000110  1.000309  ...  6.447177e+10     7.405940e+09   0.000024    
      16  2021-06-15T06:00:00.000Z  1.000309  1.000205  ...  6.447177e+10     4.256491e+09   0.000023    
      17  2021-06-16T00:00:00.000Z  1.000205  1.000104  ...  0.000000e+00     1.495518e+09   0.000023    
      18  2021-06-16T18:00:00.000Z  1.000104  0.999833  ...  0.000000e+00     3.033091e+09   0.000024    
      19  2021-06-17T12:00:00.000Z  0.999833  1.000016  ...  6.447177e+10     1.449031e+08   0.000024    
      20  2021-07-10T00:00:00.000Z  1.000016  1.000100  ...  6.446977e+10     7.586923e+08   0.000025    
      21  2021-07-10T18:00:00.000Z  1.000100  1.000199  ...  6.446977e+10     2.312489e+09   0.000025    
      22  2021-07-11T12:00:00.000Z  1.000199  1.000134  ...  6.446977e+10     2.236517e+09   0.000024    
      23  2021-07-12T06:00:00.000Z  1.000134  1.000192  ...  6.446977e+10     8.140557e+09   0.000024    
      24  2021-07-13T00:00:00.000Z  1.000192  1.000290  ...  6.446977e+10     3.846952e+09   0.000026    
      25  2021-07-13T18:00:00.000Z  1.000290  1.000411  ...  6.446977e+10     1.278604e+09   0.000026    
      26  2021-07-14T12:00:00.000Z  1.000411  1.000315  ...  6.446977e+10     3.279535e+09   0.000026    
      27  2021-07-15T06:00:00.000Z  1.000315  1.000142  ...  6.446977e+10     8.086642e+08   0.000026    
      28  2021-07-16T00:00:00.000Z  1.000142  1.000295  ...  6.446977e+10     1.187211e+09   0.000027    
      29  2021-07-16T18:00:00.000Z  1.000295  1.000610  ...  6.446977e+10     7.721854e+08   0.000027    
      30  2021-07-17T12:00:00.000Z  1.000610  1.000535  ...  6.446977e+10     4.535049e+09   0.000027    
      31  2021-07-18T06:00:00.000Z  1.000535  1.000610  ...  6.446977e+10     2.345491e+09   0.000026    
      32  2021-07-19T00:00:00.000Z  1.000610  1.000386  ...  6.446977e+10     4.725531e+09   0.000027    
      33  2021-07-19T18:00:00.000Z  1.000386  1.000215  ...  6.446977e+10     3.314499e+09   0.000028    
      34  2021-07-20T12:00:00.000Z  1.000215  1.000324  ...  6.446977e+10     5.315525e+09   0.000030    
      35  2021-07-21T06:00:00.000Z  1.000324  1.000277  ...  6.446977e+10     7.141479e+09   0.000028    
      36  2021-07-22T00:00:00.000Z  1.000277  1.000255  ...  6.446977e+10     2.533840e+09   0.000028    
      37  2021-07-22T18:00:00.000Z  1.000255  1.000325  ...  6.446977e+10     2.699050e+09   0.000027    
      38  2021-07-23T12:00:00.000Z  1.000325  1.000363  ...  6.446977e+10     2.681340e+09   0.000026    
      39  2021-07-24T06:00:00.000Z  1.000363  1.000644  ...  6.446974e+10     6.241232e+08   0.000026    
      
      [40 rows x 10 columns]
      

      【讨论】:

        猜你喜欢
        • 2016-04-20
        • 1970-01-01
        • 1970-01-01
        • 2023-04-04
        • 2022-12-20
        • 2014-02-03
        • 1970-01-01
        • 2021-07-22
        • 1970-01-01
        相关资源
        最近更新 更多