BeautifulSoup 抓取返回重复结果答案

【问题标题】：BeautifulSoup Scraping Returning Duplicate ResultsBeautifulSoup 抓取返回重复结果
【发布时间】：2017-02-18 19:14:48
【问题描述】：

我正在尝试使用以下代码从here 中提取设备名称，但我的输出返回每个项目大约 26 次。我试图找到一个合适的解决方案，却空手而归。任何关于如何完成这项工作的想法都将不胜感激。

tables = soup.find_all('table')

for table in tables:
if table.find_parent("table") is not None:
    for tr in table.find_all('tr'):
        for td in table.find_all('td'):
            for a in td.find_all('a'):
                f2.write(a['title'] + '\n')

【问题讨论】：

标签： python-2.7 web-scraping beautifulsoup

【解决方案1】：

我们分部分来说，首先是如何获取表中所有英雄的名字：

heroes = soup.find_all('span', {'style': 'white-space:nowrap'})
for hero in heroes:
    print hero.getText()

打印所有设备：

eqs = soup.find_all('div', {'style': 'margin:7px 5px 0px;vertical-align:top;text-align:center;display:inline-block;line-height:normal;width:120px;'})

for equipment in eqs:
    print equipment.getText()

【讨论】：

感谢您提供有效的代码。我试图弄清楚为什么它会起作用。从我在源代码中看到的文本实际上出现在您的代码从中提取的 div 样式之前。你能帮我理解一下吗？
@Timmay，这真的很简单，如果你看到你试图抓取的网站的源 HTML，你会看到每个设备条目都以 <div style ...> 开头，所以使用 Beautifulsoup 你只需要找到所有这些标签并简单地打印这些标签的文本，在这种情况下是设备的名称
好的。因此，以树的形式查看代码，文本确实出现在您的代码行之后。我想这就是美化代码派上用场的原因。