AttributeError：“ResultSet”对象没有属性“find_all”Beautifulsoup答案

【问题标题】：AttributeError: 'ResultSet' object has no attribute 'find_all' BeautifulsoupAttributeError：“ResultSet”对象没有属性“find_all”Beautifulsoup
【发布时间】：2015-09-09 08:40:44
【问题描述】：

我不明白为什么会出现这个错误：

我有一个相当简单的功能：

def scrape_a(url):
  r = requests.get(url)
  soup = BeautifulSoup(r.content)
  news =  soup.find_all("div", attrs={"class": "news"})
  for links in news:
    link = news.find_all("href")
    return link

这是我要抓取的网页结构：

<div class="news">
<a href="www.link.com">
<h2 class="heading">
heading
</h2>
<div class="teaserImg">
<img alt="" border="0" height="124" src="/image">
</div>
<p> text </p>
</a>
</div>

【问题讨论】：

你为什么要遍历news，然后调用news.find_all()？大概您打算改用links.find_all？
另外，href 是标签的属性，而不是标签名。
另外，你的意思是只返回 first 结果吗？
这能回答你的问题吗？ Beautiful Soup: 'ResultSet' object has no attribute 'find_all'?

标签： python web-scraping beautifulsoup

【解决方案1】：

你做错了两件事：

您正在对news 结果集调用find_all；大概您打算在 links 对象上调用它，该对象是该结果集中的一个元素。
您的文档中没有<href ...> 标记，因此使用find_all('href') 搜索不会得到任何信息。您只有带有href 属性的标签。

您可以将代码更正为：

def scrape_a(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.content)
    news =  soup.find_all("div", attrs={"class": "news"})
    for links in news:
        link = links.find_all(href=True)
        return link

做我认为你试图做的事情。

我会使用CSS selector：

def scrape_a(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.content)
    news_links = soup.select("div.news [href]")
    if news_links:
        return news_links[0]

如果你想返回 href 属性的值（链接本身），你当然也需要提取它：

return news_links[0]['href']

如果您需要所有链接对象，而不是第一个，只需为链接对象返回 news_links，或使用列表推导来提取 URL：

return [link['href'] for link in news_links]

【讨论】：

@Martin Pieters：谢谢，现在这只是给了我第一个 div 的一个链接，而我希望页面上有所有链接（hrefs）
@Imo：您的代码只返回了第一个匹配项，这就是我的回答遵循的原因。我添加了两个选项来返回所有链接对象，或者只是它们的 href 值。
@Imo：将来，如果您在问题中添加实际和预期结果，即所谓的minimal reproducible example，将会很有帮助。这样一来，例如，您想要一个 url 列表就很明显了。