BeautifulSoup Scraper 找不到文本？AttributeError: ResultSet object has no attribute 'find_all'答案

【问题标题】：BeautifulSoup Scraper can't find text?AttributeError: ResultSet object has no attribute 'find_all'BeautifulSoup Scraper 找不到文本？AttributeError: ResultSet object has no attribute 'find_all'
【发布时间】：2020-09-26 21:56:10
【问题描述】：

对编程非常陌生，对任何不良做法感到抱歉：

我正在尝试制作一个网络爬虫，它可以在 Indeed.com 上搜索我所在领域的职位列表，并在网上关注了一些关于它的文章，我以为我理解了，但现在我想我有一个误解。

我正在尝试抓取我在 html 中找到的作业的位置，如下所示： html code

为了抓取该位置，我被告知要执行以下操作：

 grabbing location name
                c = div.find_all(name="span",attrs={"class":"location"})
                for span in c:
                    print(span.text)
                    job_post.append(span.text)

但是我注意到有时网页会在 div 下加载它，而不是 span，所以我将代码编辑如下：

 def find_location_for_job(self,div,job_post,city):
        div2 = div.find_all(name="div",attrs={"class":"sjcl"})
        print(div2)
        try:
            div3 = div2.find_all(name="div",attrs={"class":"location accessible-contrast-color-location"})
            job_post.append(div3.text)
        except:
            span = div2.find_all(name="span",attrs={"class":"location accessible-contrast-color-location"})
            job_post.append(span.text)

        print(job_post)

但是，有一半的时间它仍然说它无法在 div/span 中找到文本，即使我搜索帖子并看到它被标记为一个或另一个。

AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

请注意，我留下了找到的代码，因为当使用 div 而不是 span 时，它不会捕获结果。所以我的下一个故障排除步骤是将我的想法与他们的想法结合起来，如下所示：

def find_location_for_job(self,div,job_post,city):
    div2 = div.find_all(name="div",attrs={"class":"sjcl"})
    try:
        div3 = div2.find_all(name="div",attrs={"class":"location accessible-contrast-color-location"})
        for span in div3:
            job_post.append(span.text)
    except:
        div4 = div.findAll("span",attrs={"class":"location accessible-contrast-color-location"})
        for span in div4:
            job_post.append(span.text)

但是，此方法会将整个位置列表扔到它抓取的每个条目中（它会抓取每个城市的 10 个帖子，因此此方法将 10 个位置扔到 10 个帖子条目中的每个条目中）

谁能告诉我我在哪里放屁？

编辑：pastebin 中的完整代码：https://pastebin.com/0LLb9ZcU

【问题讨论】：

优秀的第一Q！作为初学者程序员，您做得很好。保持良好的工作并继续发布组织良好的 Q。你是班级的功劳！

标签： python html web-scraping beautifulsoup screen-scraping

【解决方案1】：

div2 是 ResultSet，因为当您使用 BeautifulSoup 的 find_all 方法时，它会返回。您需要遍历 ResultSet 并搜索内部字段，如下所示：

def find_location_for_job(self, div, job_post, city): 
    div2 = div.find_all(name="div",attrs={"class":"sjcl"})
    for sjcl_div in div2:
        div3 = div2.find_all(name="div",attrs={"class":"location accessible-contrast-color-location"})
        div4 = div.find_all("span",attrs={"class":"location accessible-contrast-color-location"})
        if div3:
            for span in div3:
                job_post.append(span.text)
        elif div4:
            for span in div4:
                job_post.append(span.text)
        else:
            print("Uh-oh, couldn't find the tags!")

【讨论】：

这一切对我来说都很有意义，除了一个较小的语法部分：有 div3 和 div4 的声明，然后你对它们执行 if 语句，这是我感到困惑的地方，是如果检查它是否找到文本并将其分配给它？如果是这样，它在幕后如何运作？是检查真值吗？感谢您的所有帮助！
很好的后续问题！这是它的工作原理。 find_all 将始终返回一个列表，在 python 中，一个空列表是一个 false-y 值，而包含任何内容的列表是一个 true-y 值。要查看实际情况，请打开 Python IDLE 并输入test = []，然后输入print(bool(test))。因此，在 if/elif/else 中，我会检查您期望的两个场景，然后作为最佳实践，考虑任何意外或以前未遇到的情况。