【问题标题】:Scrape Historical Bitcoin Data from Coinmarketcap with BeautifulSoup使用 BeautifulSoup 从 Coinmarketcap 中抓取历史比特币数据
【发布时间】:2022-01-02 19:55:14
【问题描述】:

我正在尝试从 coinmarketcap.com 抓取历史比特币数据,以获取从年初到 2021 年 9 月 30 日的收盘价、交易量、日期、高低值。在浏览了几个小时的线程和视频之后,而且我是使用 Python 抓取的新手,我不知道我的错误是什么(或者网站有什么我没有检测到的东西?)。以下是我的代码:

from bs4 import BeautifulSoup
import requests
import pandas as pd


closeList = []
volumeList = []
dateList = []
highList = []
lowList = []

website = 'https://coinmarketcap.com/currencies/bitcoin/historical-data/'

r = requests.get(website)

r = requests.get(website)
soup = BeautifulSoup(r.text, 'lxml')

tr = soup.find_all('tr')
FullData = []
for item in tr:
    closeList.append(item.find_all('td')[4].text)
    volumeList.append(item.find_all('td')[5].text)
    dateList.append(item.find('td',{'style':'text-align: left;'}).text)
    highList.append(item.find_all('td')[2].text)
    lowList.append(item.find_all('td')[3].text)
    FullData.append([closeList,volumeList,dateList,highList,lowList])

df_columns = ["close", "volume", "date", "high", "low"]

df = pd.DataFrame(FullData, columns = df_columns)
print(df)

结果我只得到:

Empty DataFrame
Columns: [close, volume, date, high, low]
Index: []

任务要求我使用 BeautifulSoup 进行抓取,然后导出到 csv(很明显,这只是 df.to_csv - 有人可以帮助我吗?我们将不胜感激。

【问题讨论】:

    标签: python pandas web-scraping beautifulsoup


    【解决方案1】:

    实际上,数据是由javascript从api调用json响应动态加载的。因此,您可以通过以下方式轻松抓取数据:

    代码:

    import requests
    import json
    import pandas as pd
    api_url= 'https://api.coinmarketcap.com/data-api/v3/cryptocurrency/historical?id=1&convertId=2781&timeStart=1632441600&timeEnd=1637712000'
    r = requests.get(api_url)
    data = []
    for item in r.json()['data']['quotes']:
        close = item['quote']['close']
        volume =item['quote']['volume']
        date=item['quote']['timestamp']
        high=item['quote']['high']
        low=item['quote']['low']
        data.append([close,volume,date,high,low])
    
    
    cols = ["close", "volume","date","high","low"]
    
    df = pd.DataFrame(data, columns= cols)
    print(df)
    #df.to_csv('info.csv',index = False)
    

    输出:

               close        volume                      date          high           low
    0   42839.751696  4.283935e+10  2021-09-24T23:59:59.999Z  45080.491063  40936.557169
    1   42716.593147  3.160472e+10  2021-09-25T23:59:59.999Z  42996.259704  41759.920425
    2   43208.539105  3.066122e+10  2021-09-26T23:59:59.999Z  43919.300970  40848.461660
    3   42235.731847  3.098003e+10  2021-09-27T23:59:59.999Z  44313.245882  42190.632576
    4   41034.544665  3.021494e+10  2021-09-28T23:59:59.999Z  42775.146142  40931.662500
    ..           ...           ...                       ...           ...           ...
    56  58119.576194  3.870241e+10  2021-11-19T23:59:59.999Z  58351.113266  55705.180685
    57  59697.197134  3.062426e+10  2021-11-20T23:59:59.999Z  59859.880442  57469.725661
    58  58730.476639  2.612345e+10  2021-11-21T23:59:59.999Z  60004.426383  58618.931432
    59  56289.287323  3.503612e+10  2021-11-22T23:59:59.999Z  59266.358468  55679.840404
    60  57569.074876  3.748580e+10  2021-11-23T23:59:59.999Z  57875.516397  55632.759912
    
    [61 rows x 5 columns]
    

    【讨论】:

    • 这真是令人印象深刻 - 非常感谢@Fazlul!一个后续问题:也可以通过 BeautifulSoup 或 Selenium 来完成吗?
    • @Jeffrey Sachs,仅通过 BeautifulSoup 完成是完全不可能的,因为 url 是动态的。如果您从浏览器禁用 javacript,那么您会注意到数据从 url 中消失了。是的,可以使用硒,但有点复杂。上述方案是提取数据最好、最简单、最快的方法。谢谢
    • 非常感谢您的帮助。非常感谢!
    猜你喜欢
    • 2021-02-21
    • 2018-03-25
    • 2022-08-08
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多