Python-HTML-如何使用 BeautifulSoup 去除标签之间的内容答案

【问题标题】：Python-HTML-How to strip out content in between tags using BeautifulSoupPython-HTML-如何使用 BeautifulSoup 去除标签之间的内容
【发布时间】：2013-07-19 02:19:22
【问题描述】：

我在做什么：我正在编写一个网页提取器来收集天气数据。这是我到目前为止所做的：

import urllib.request
from bs4 import BeautifulSoup

# open the webpage and assign the content to a new variable
base = urllib.request.urlopen('http://www.weather.com/weather/today/Beijing+CHXX0008:1:CH')
f = base.readlines()
f = str(f)


soup = BeautifulSoup(f)

rn_base = soup.find_all(itemprop="temperature-fahrenheit")

如果你 print 变量 rn_base，你会得到：[<span class="wx-value" itemprop="temperature-fahrenheit">75</span>]，我认为这是一个只有一个元素的列表。号码75 是我的目标。

问题：我尝试了几种方法来获取号码，但都失败了。它们是，即1）使用str.join()将rn_base转换为字符串，但由于rn_base是ResultSet对象而失败； 2）使用索引切片，但是因为它不是字符串主题，所以失败了。 3) 使用get_text() 中指定的beautifulsoup documentation，但得到AttributeError: 'ResultSet' object has no attribute 'get_text'。

非常感谢任何帮助！

【问题讨论】：

标签： python html parsing beautifulsoup

【解决方案1】：

rn_base 是一个 resultSet 对象，所以即使结果只是一个，它也假定可能有很多结果。所以，

for rn in rn_base
Print rn.string

此 for 循环将从结果中提取每一行（当它们多次出现“华氏温度”时）

正如您所说，您正在尝试获取天气数据，我认为使用带限制的 find() 比使用 find_all() 更好

【讨论】：

谢谢！真的被resultset 类弄糊涂了。