去除 Html 标签 Findall + Beautiful Soup

【问题标题】：Strip Html Tags Findall + Beautiful Soup去除 Html 标签 Findall + Beautiful Soup
【发布时间】：2018-01-17 19:14:27
【问题描述】：

好吧，我可能已经进行了 2 个小时的搜索，我相信我的大脑可能只是油炸了。今天是我和 BeautifulSoup 的第一天（所以请温柔一点）。我正在抓取的网站的源代码格式如下：

<a href="/listing/view" class="price">$100</a>

我觉得自己很愚蠢，因为我在写入文件时得到了整个 a 标签，并且我偷偷怀疑有这么简单的解决方案，但我似乎找不到它。

目前我正在使用以下内容：

soup = BeautifulSoup(page.content, 'html.parser')
prices = soup.find_all(class_="price")
passed.append(prices)

如何仅针对特定标签之间具有匹配类的内容？

【问题讨论】：

【解决方案1】：

prices = soup.find_all(class_="price")

for a in prices:
  passed.append(int(a.text.strip().replace('$','')) # will append to the list

这应该会有所帮助。

【讨论】：

即便如此，我仍然得到以下结果： ['\n\t\t\t\t\t\t$465\n\t\t\t\t\t\ t', '\n\t\t\t\t\t\t$515\n\t\t\t\t\t\t', 我想要的只是整数值
strip() 将去掉空格，replace() 用于$
\n 表示换行，\t 表示制表符。 strip() 删除所有这些空格。如果你这样做 print("\tHI") 和 print("\nHI") 你会很清楚:)