从导入的模块查询数据库时出错答案

【问题标题】：Error querying a database from imported module从导入的模块查询数据库时出错
【发布时间】：2021-08-22 00:52:10
【问题描述】：

我需要从导入的模块中提取 5000 个结果，但如果我尝试返回 1000，我会收到错误消息。我最多可以返回 500 个结果 (num_players=500)。理想情况下，我可以抽取 5000 个随机结果，但我猜前 5000 个必须这样做。我只需要样本数据就可以在 Excel 中运行分析。以下代码是从此处找到的文档中的示例中提取的。 https://pyett.readthedocs.io/en/latest/cohort.html

有人对我如何使其正确运行有任何建议吗？为什么会失去与数据库的连接？

from pyETT import ett
import pandas as pd

lb_cohort = ett.Cohort(ett.ETT().get_leaderboard(num_players=1000))
lb_cohort.size
df = lb_cohort.players_dataframe()
print(df)

file_name = 'export_file.xlsx'
df.to_excel(file_name)
print('DataFrame is written to Excel File successfully.')

这是我得到的例外：

Message=无法连接到主机www.elevenvr.club:443ssl:default [信号量超时期限已过]

来源=C:\Users\Apache Paint\source\repos\Patrick_Kimble \Patrick_Kimble.py

堆栈跟踪：

文件“C:\Users\Apache Paint\source\repos\Patrick_Kimble\Patrick_Kimble.py”，第 7 行，在（当前帧）
lb_cohort = ett.Cohort(ett.ETT().get_leaderboard(num_players=1000))
HTTPSConnectionPool(host='www.elevenvr.club', port=443): url: /accounts/search/bensnow/ 超过最大重试次数（由 NewConnectionError(': Failed to建立新连接：[WinError 10060]
连接尝试失败，因为连接方在一段时间后没有正确响应，或者连接失败，因为连接的主机没有响应'))

编辑：

我创建了一个循环来尝试从数据库中提取所有数据。

for i in range(1, 500000):
            try:
                if int(i) % 100 == 0:
                    print('Loop is at:', i)
                user_id = i
                line = ett.ett_parser.get_user(user_id)
                temp_df = pd.DataFrame(line, index=[i])
                self.df_master = self.df_master.append(temp_df, ignore_index = True)
            except Exception:
                print("Error:",i )

MDR 的答案可以返回随机数据。但我需要使用describe() 函数来拉回更多其他细节，但它只接受“队列”类型。

例子：

import pandas as pd

lb_cohort = ett.Cohort(ett.ETT().get_leaderboard(num_players=10))
lb_cohort.size
lb_cohort.describe()

应该返回类似于下图的内容。

【问题讨论】：

网站可能故意限制大型请求以限制带宽使用。
没有解决办法吗？
在下面查看我的答案。在搜索随机用户名时，我设法从循环中获取了 11,541 个唯一用户名行。如果您喜欢它，请给它投票并标记为答案：o）。

标签： python sql pandas dataframe python-import

【解决方案1】：

由于.user_search_dataframe() 接受一个字符串并根据部分匹配返回一个框架，因此您可以组成一长串用户名，将其循环，然后将这些框架连接在一起。

示例：

from pyETT import ett
import pandas as pd
from time import sleep
from datetime import datetime

eleven = ett.ETT()

# test short list
l = ['happy', 'honey', 'mad']

# what makes a good username?  'Neo' sure, but what else?
# l = ['League', 'Knight', 'happy', 'honey', 'mad', 'crazy', 'Super', 'one', 'neo', 'duke', 'wizard', 'two', 'jon', 'bob', 'Dog']

dfs = []

for name in l:
    df = eleven.user_search_dataframe(name)
    #print(df.shape)
    
    # hangs a bit so slow it down/avoid timeouts
    sleep(10)
    dfs.append(df)
    
df = pd.concat(dfs, ignore_index=True)

print('Size before dropping duplicate names: ', df.shape)
df = df.drop_duplicates(subset=['name']).reset_index(drop=True)

# should users who have never won or lost a game be removed?
# maybe they never played a game and are just in the system?
# if so uncomment this line...
# df = df.loc[(df[['wins', 'losses']] != 0).any(axis=1)]

df['last_online'] = pd.to_datetime(df['last_online'], format='%Y-%m-%dT%H:%M:%S.%fZ')

df = df.sort_values('rank')

print('Final size of frame: ', df.shape)

print('Random sample of results:', '\n')
# random sample from throughout the frame
print(df.sample(n=20))

# timestamp in file name helps if you have the file open when the script is running and it cannot overwrite
df.to_excel('export_file_' + datetime.now().strftime("%H_%M_%S") + '.xlsx', index=False)

输出（基于代码中的长列表）：

Size before dropping duplicate names:  (11755, 7)
Final size of frame:  (11541, 7)
Random sample of results:

          id                 name     elo    rank  wins  losses  \
2767  548341          shani_ahmad  1499.0  482261     1       1   
2795  575725            MadisonLu  1500.0  466685     0       0   
2087  343031             Jomadi97  1797.5    5151   126      76   
4640  159384           TwoHungLow  1500.0   48816     0       0   
530   193084           Happybloke  1500.0  165971     1       1   
3952  538362              Neo2442  1471.0  546859     0       2   
783   555710  HappySanguineGaming  1485.0  477280     2       4   
9435   73557           NateDoggLi  1500.0  104922    22      66   
1     268668         IvyLeague412  1489.0  349980     2       2   
2202  387282             Madlog31  1500.0  370736     0       0   
1319   20604              Madssr1  1516.0   33429     1       1   
739   407953            Happy0321  1500.0  343960     0       0   
2165  379302              SamAdam  1500.0  324270     0       0   
1693  222456               Hamada  1485.0  504640     0       2   
778   451963             happylyu  1500.0  380432     0       0   
6740  192120            JonRose32  1500.0  315481     0       0   
796   562459          UiJun_Happy  1526.0   39078    19      10   
7292  319677               Dapbob  1500.0  220062     0       0   
4991  590248      natwon.brooks.3  1500.0  477078     0       0   
9859  248163             postdog4  1500.0  107996     0       0   

                 last_online  
2767 2021-08-03 18:05:10.423  
2795 2021-07-19 23:21:06.618  
2087 2021-05-25 14:10:35.903  
4640 2020-11-28 14:38:30.703  
530  2020-12-31 17:25:02.802  
3952 2021-08-18 19:29:39.149  
783  2021-07-06 01:35:36.241  
9435 2020-10-20 13:16:21.542  
1    2021-01-31 01:23:54.627  
2202 2021-06-02 19:45:27.265  
1319 2020-03-27 13:17:35.754  
739  2021-03-25 23:49:28.654  
2165 2021-03-07 23:41:26.949  
1693 2021-03-11 03:17:06.368  
778  2021-04-29 16:51:31.216  
6740 2021-07-08 04:08:08.927  
796  2021-08-08 11:57:48.181  
7292 2021-02-14 14:08:20.299  
4991 2021-07-30 13:02:12.894  
9859 2021-01-03 08:35:27.054

【讨论】：

我目前无法测试脚本，但这看起来很有用。我会遇到的唯一问题是使用 details() 参数时，因为它只排除“群组”。请查看我的编辑。
抱歉，我只是从“我只需要样本数据在 Excel 中运行分析”这一行开始工作——我假设“样本”是玩家的。该代码可在https://github.com/souzatharsis/pyETT 获得，如果您仔细研究一下，您可以了解它的作用。您可以找到指向 api 的链接，例如 https://elevenvr.club/accounts/15583/elo-history，其中 15583 是玩家 id。也许写一些定制的东西来得到需要的东西。