使用 bs4 和 pandas 格式化数据时遇到问题答案

【问题标题】：Trouble formatting data with bs4 and pandas使用 bs4 和 pandas 格式化数据时遇到问题
【发布时间】：2022-01-24 01:32:19
【问题描述】：

我一直在尝试摆脱围绕价格的 html，但我尝试过的任何方法都没有奏效。

from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import requests

url = "https://www.coingecko.com/en"
result = requests.get(url)
data = BeautifulSoup(result.text, "html.parser")
prices = data.find_all(attrs={"class":"no-wrap", "class" : "td-price price text-right pl-0"})

a = np.asarray(prices[0:10])
df = pd.DataFrame(a)
print(df.text)

数据是这样出来的：

\n  [$36,015.04]  \n
1  \n   [$2,499.62]  \n
2  \n       [$1.00]  \n
3  \n     [$378.24]  \n
4  \n       [$1.00]  \n
5  \n       [$1.11]  \n
6  \n   [$0.620196]  \n
7  \n      [$93.38]  \n
8  \n      [$68.86]  \n
9  \n      [$18.50]  \n

我正在努力摆脱 \n、括号和美元符号

错误信息是

VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a 
list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is 
deprecated. If you meant to do this, you must specify 'dtype=object' when creating the 
ndarray.

我对使用 python 很陌生，如果这是一个明显的问题，我很抱歉。

谢谢

【问题讨论】：

嗨。首先，您的代码{"class":"no-wrap", "class" : "td-price price text-right pl-0"} 创建了一个python dict，但是python dict 不能有多个具有相同键的键值对，因此第二个实例将默默地覆盖第一个实例，而您将得到{"class" : "td-price price text-right pl-0"}。然后，你没有提到你正在做什么来转换输出，也没有说哪个代码行生成了错误消息。

标签： python pandas web-scraping beautifulsoup

【解决方案1】：

代码如下：

prices = data.find_all(attrs={"class":"no-wrap", "class" : "td-price price text-right pl-0"})
prices = [price.text.replace("\n","").strip() for price in prices]

第二行将在价格表中重复并删除所有\n（即换行符），.strip 将删除项目开头或结尾的所有空格。

【讨论】：