如何获得以下 python 代码来输出 worldmaps.info （似乎这个问题已得到回答，但对我不起作用）答案

【问题标题】：How can I get the following python code to output worldmaps.info (it seems this question was answered but does not work for me)如何获得以下 python 代码来输出 worldmaps.info （似乎这个问题已得到回答，但对我不起作用）
【发布时间】：2020-10-15 08:41:27
【问题描述】：

我试图从 worldometer.info 获取值（类似于帖子Python: No tables found matching pattern '.+'）我使用的代码如下：

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = 'https://www.worldometers.info/coronavirus/#countries'
header = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9","X-Requested-With": "XMLHttpRequest"}

r = requests.get(url, headers=header)

# fix HTML multiple tbody
soup = BeautifulSoup(r.text, "html.parser")
for body in soup("tbody"):
    body.unwrap()

print(soup)

df = pd.read_html(str(soup), index_col=1, thousands=r',', flavor="bs4")[0]
df = df.replace(regex=[r'\+', r'\,'], value='')

df = df.fillna('0')
df = df.to_json(orient='index')

print(df)

输出是页面的html，然后当pandas处理它时出现错误：

Traceback (most recent call last):
  File "./covid19_status.py", line 37, in <module>
    df = pd.read_html(str(soup), index_col=1, thousands=r',', flavor="bs4")[0]
  File "/usr/local/lib64/python3.6/site-packages/pandas/util/_decorators.py", line 296, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib64/python3.6/site-packages/pandas/io/html.py", line 1101, in read_html
    displayed_only=displayed_only,
  File "/usr/local/lib64/python3.6/site-packages/pandas/io/html.py", line 917, in _parse
    raise retained
  File "/usr/local/lib64/python3.6/site-packages/pandas/io/html.py", line 898, in _parse
    tables = p.parse_tables()
  File "/usr/local/lib64/python3.6/site-packages/pandas/io/html.py", line 217, in parse_tables
    tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
  File "/usr/local/lib64/python3.6/site-packages/pandas/io/html.py", line 563, in _parse_tables
    raise ValueError(f"No tables found matching pattern {repr(match.pattern)}")
ValueError: No tables found matching pattern '.+'

谁能告诉我如何解决这个问题？我尝试使用类似文章中的正则表达式，但无法使其正常工作，并且未包含在此代码中（我对 python 非常熟悉）。

提前致谢！

【问题讨论】：

标签： python pandas beautifulsoup

【解决方案1】：

您可以按照this question 的答案中提供的代码进行操作。完整代码如下：

import pandas as pd
import requests
import re

url = 'https://www.worldometers.info/coronavirus/#countries'
header = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9","X-Requested-With": "XMLHttpRequest"}

r = requests.get(url, headers=header).text

r = re.sub(r'<.*?>', lambda g: g.group(0).upper(), r)

dfs = pd.read_html(r)

dfs[0].to_csv('D:\\Worldometer.csv',index = False)

CSV 文件的屏幕截图：

【讨论】：

非常感谢！是的，这很有帮助。实际上，我更早地得到了这个工作，但我只是在寻找第 3 列和第 4 列的输出以添加为指标。这会非常困难吗？ ++10
作为指标？你是什么意思？你能更清楚吗？顺便说一句，如果我的回答对您有所帮助，请点击投票按钮下方的勾号，将其作为最佳答案。
使用相同的代码我仍然收到一个错误：回溯（最后一次调用）：文件“./covid19_status.py”，第 31 行，在 r = re.sub(r '<.>', lambda g: g.group(0).upper(), r) 文件“/usr/lib64/python3.6/re.py”，第 191 行，在 sub return _compile(pattern, flags).sub(repl, string, count) TypeError: expected string or bytes-like object` 另外，我想使用 BeautifulSoup 和 pandas 为另一个应用程序格式化。
我确实点击了向上箭头，但它没有增加数字？？？
对于你得到的错误——将r = re.sub(r'<.*?>', lambda g: g.group(0).upper(), r)更改为r = re.sub(r'<.*?>', lambda g: g.group(0).upper(), str(r))