【发布时间】:2022-01-24 01:32:19
【问题描述】:
我一直在尝试摆脱围绕价格的 html,但我尝试过的任何方法都没有奏效。
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import requests
url = "https://www.coingecko.com/en"
result = requests.get(url)
data = BeautifulSoup(result.text, "html.parser")
prices = data.find_all(attrs={"class":"no-wrap", "class" : "td-price price text-right pl-0"})
a = np.asarray(prices[0:10])
df = pd.DataFrame(a)
print(df.text)
数据是这样出来的:
\n [$36,015.04] \n
1 \n [$2,499.62] \n
2 \n [$1.00] \n
3 \n [$378.24] \n
4 \n [$1.00] \n
5 \n [$1.11] \n
6 \n [$0.620196] \n
7 \n [$93.38] \n
8 \n [$68.86] \n
9 \n [$18.50] \n
我正在努力摆脱 \n、括号和美元符号
错误信息是
VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a
list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is
deprecated. If you meant to do this, you must specify 'dtype=object' when creating the
ndarray.
我对使用 python 很陌生,如果这是一个明显的问题,我很抱歉。
谢谢
【问题讨论】:
-
嗨。首先,您的代码
{"class":"no-wrap", "class" : "td-price price text-right pl-0"}创建了一个python dict,但是python dict 不能有多个具有相同键的键值对,因此第二个实例将默默地覆盖第一个实例,而您将得到{"class" : "td-price price text-right pl-0"}。然后,你没有提到你正在做什么来转换输出,也没有说哪个代码行生成了错误消息。
标签: python pandas web-scraping beautifulsoup