【发布时间】:2022-01-03 13:54:19
【问题描述】:
我正在尝试在markets.ft 网站上抓取一张表格,不幸的是其中有许多图标(表格:'Lipper Leader Scorecard' - https://markets.ft.com/data/funds/tearsheet/ratings?s=LU0526609390:EUR)。
当我使用 BeautifulSoup 时,我可以抓取表格,但所有值都是 NaN。
有没有办法把表格里面的图标刮下来转成数字?
我的代码是:
import requests
import pandas as pd
from bs4 import BeautifulSoup
id_list = ['LU0526609390:EUR','IE00BHBX0Z19:EUR', 'LU1076093779:EUR', 'LU1116896363:EUR', 'LU1116896876:EUR']
urls = ['https://markets.ft.com/data/funds/tearsheet/ratings?s='+ x for x in id_list]
dfs =[]
for url in urls:
r = requests.get(url).content
soup = BeautifulSoup(r, 'html.parser')
# Some funds in the list do not have any data.
try:
table = soup.find_all('table')[0]
print(table)
except Exception:
continue
df = pd.read_html(str(table), index_col=0)[0]
dfs.append(df)
print(dfs)
基金所需的输出 (LU0526609390):
Total return Consistent return Preservation Expense
Overall rating 3 3 5 5
3 year rating 3 3 5 5
5 year rating 2 3 5 5
【问题讨论】:
标签: python pandas web-scraping beautifulsoup icons