【问题标题】:Web Scraping Table with class from CoinMarketCap w来自 CoinMarketCap w 类的 Web Scraping Table
【发布时间】:2021-08-10 03:51:05
【问题描述】:

我正在尝试从以下位置抓取整个数据表: https://coinmarketcap.com/currencies/ethereum/historical-data/

我正在尝试提取表格:

<table class="h7vnx2-2 jNaLNi cmc-table ">

table = soup.find('table', {"class":"h7nvx2-2 jNaLNi cmc-table"})

它返回:无

这是完整的代码:

import requests
from bs4 import BeautifulSoup

def main():
    URL = "https://coinmarketcap.com/currencies/ethereum/historical-data/"
    page = requests.get(URL)

    soup = BeautifulSoup(page.content, "html.parser")

    table = soup.find('table', {"class":"h7nvx2-2 jNaLNi cmc-table"})
    print(table)


if __name__ == "__main__":
    main()

【问题讨论】:

  • 如果是动态生成的,您是否尝试查找 Etherium 的历史数据

标签: beautifulsoup


【解决方案1】:
import pandas as pd
from urllib.parse import urlencode


def main(url):
    params = {
        'id': '1027',
        'convertId': '2781',
        'timeStart': '1623283200',
        'timeEnd': '1628553600'
    }

    df = pd.DataFrame(pd.read_json(
        url + urlencode(params))['data']['quotes'])
    df = pd.DataFrame.from_records(df['quote'])
    print(df)


main('https://api.coinmarketcap.com/data-api/v3/cryptocurrency/historical?')

【讨论】:

    【解决方案2】:

    如上所述,数据是动态呈现的。直接去api获取数据:

    import requests
    import pandas as pd
    
    url ='https://api.coinmarketcap.com/data-api/v3/cryptocurrency/historical'
    payload = {
       'id':'1027',
       'convertId':'2781',
       'timeStart':'1623283200',
       'timeEnd':'1628553600' }
    
    jsonData = requests.get(url, params=payload).json()
    df = pd.json_normalize(jsonData, record_path=['data','quotes'])
    
    #rows = [i['quote'] for i in jsonData['data']['quotes']]
    #df = pd.DataFrame(rows)
    

    输出:

    print(df)
    
    Output from spyder call 'get_namespace_view':
               open         high  ...     marketCap                 timestamp
    0   2611.142652  2619.957793  ...  2.872870e+11  2021-06-10T23:59:59.999Z
    1   2472.858836  2495.414705  ...  2.736314e+11  2021-06-11T23:59:59.999Z
    2   2354.752218  2447.227868  ...  2.758389e+11  2021-06-12T23:59:59.999Z
    3   2372.690096  2547.367910  ...  2.916739e+11  2021-06-13T23:59:59.999Z
    4   2508.770462  2606.432929  ...  2.951126e+11  2021-06-14T23:59:59.999Z
    ..          ...          ...  ...           ...                       ...
    56  2725.669632  2840.430748  ...  3.307572e+11  2021-08-05T23:59:59.999Z
    57  2827.503486  2944.903352  ...  3.382378e+11  2021-08-06T23:59:59.999Z
    58  2891.707469  3170.229727  ...  3.694372e+11  2021-08-07T23:59:59.999Z
    59  3161.232779  3184.603971  ...  3.526859e+11  2021-08-08T23:59:59.999Z
    60  3012.885711  3185.701187  ...  3.707654e+11  2021-08-09T23:59:59.999Z
    
    [61 rows x 7 columns]
    

    【讨论】:

    • 由于性能原因,不太可能使用循环行来构造 DataFrame。
    • 改为使用.json_normalize()
    猜你喜欢
    • 2016-10-04
    • 1970-01-01
    • 2018-11-13
    • 2022-11-22
    • 2014-09-12
    • 2022-12-26
    • 2021-10-07
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多