【问题标题】:How to import data from a HTML table on a website to excel?如何将网站上的 HTML 表格中的数据导入到 Excel 中?
【发布时间】:2020-11-13 06:20:12
【问题描述】:

我想用 Python 对来自 Evolution Gaming 的名为 Crazy Time 的真人娱乐场游戏进行一些统计分析。有一个网站有数据可以做到这一点:https://tracksino.com/crazytime。我希望将最低表“旋转历史”的数据导入 excel。但是,我现在不知道如何做到这一点。谁能告诉我从哪里开始?

提前致谢!

【问题讨论】:

  • 如果你使用python,那么最简单的方法是使用requestsbeautifulsoup 抓取数据并将其存储在本地文件中,以便以后进行分析

标签: web-scraping html-table


【解决方案1】:

试试下面的代码:

    import json
    import requests
    from urllib3.exceptions import InsecureRequestWarning
    requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
    import csv
    import datetime

    def scrap_history():
      
    csv_headers = [] 
    file_path = '' #mention your system where you have to save the file
    file_name = 'spin_history.csv' # filename
    page_number = 1

    while True:
        #Dynamic URL fetching data in chunks of 100
        url = 'https://api.tracksino.com/crazytime_history?filter=&sort_by=&sort_desc=false&page_num=' + str(page_number) + '&per_page=100&period=24hours'
        print('-' * 100)
        print('URL created : ',url)
        response = requests.get(url,verify=False)
        result = json.loads(response.text) # loading data to convert in JSON.
        history_data = result['data']
        print(history_data)

        if history_data != []:
            with open(file_path + file_name ,'a+') as history:
                #Headers for file
                csv_headers = ['Occured At','Slot Result','Spin Result','Total Winners','Total Payout',]
                csvwriter = csv.DictWriter(history, delimiter=',', lineterminator='\n',fieldnames=csv_headers)
                if page_number == 1:
                    print('Writing CSV header now...')
                    csvwriter.writeheader()
                
                #write exracted data in to csv file one by one
                for item in history_data:
                    value = datetime.datetime.fromtimestamp(item['when'])
                    occured_at = f'{value:%d-%B-%Y @ %H:%M:%S}'
                    csvwriter.writerow({'Occured At':occured_at,
                                            'Slot Result': item['slot_result'],
                                            'Spin Result': item['result'],
                                            'Total Winners': item['total_winners'],
                                            'Total Payout': item['total_payout'],                                                    
                                          })
                
            print('-' * 100)
            page_number +=1
            print(page_number)
            print('-' * 100)
        else:
            break

说明: 我已经使用 python 请求方式实现了上述脚本。 API url https://api.tracksino.com/crazytime_history?filter=&sort_by=&sort_desc=false&page_num=1&per_page=50&period=24hours 从网站本身提取(参考截图)。在第一步中,脚本将采用动态 URL,其中页码是动态的,并且在每次迭代时都会更改。例如:- 首先是 page_num = 1,然后是 page_num = 2,依此类推,直到提取所有数据。

【讨论】:

    猜你喜欢
    • 2017-04-23
    • 2011-08-27
    • 2013-02-02
    • 1970-01-01
    • 2023-01-04
    • 2022-09-30
    • 1970-01-01
    • 1970-01-01
    • 2013-02-19
    相关资源
    最近更新 更多