【问题标题】:BeautifulSoup find_all function return an empty list []BeautifulSoup find_all 函数返回一个空列表 []
【发布时间】:2021-12-22 13:15:29
【问题描述】:

我尝试了很多不同的答案,但没有任何效果。

我正在尝试抓取 Play 商店网站上的所有评论,发现class_ = "d15Mdf bAhLNe" 是我想要的容器,但我得到一个空列表。

当我尝试soup.find_all({class : d15Mdf bAhLNe}) 组合时。 x

问题是,当我打印汤时,我会捕获 HTML 文件。我错过了什么?

from bs4 import BeautifulSoup
import requests

html_text = requests.get('https://play.google.com/store/apps/details?id=com.google.android.googlequicksearchbox&hl=en').text
soup = BeautifulSoup(html_text, 'lxml')
reviews = soup.find_all('div', class_="d15Mdf bAhLNe")
print(reviews)
``

【问题讨论】:

    标签: python html class web-scraping beautifulsoup


    【解决方案1】:

    如果你打印出soup而不是reviews,你会看到你得到的html内容与直播网站上的html内容不同。因为您不是浏览器,所以动态创建内容的脚本没有发挥作用。在此处查看更详细的答案:How can a scraped HTML be different from the source code?

    【讨论】:

      【解决方案2】:

      查看网页并使用 Postman 快速浏览一下,我可以判断您正在寻找的完全相同的内容可能是 Javascript 生成的。

      事实上,在你的目标之前最新和最接近的 div 应该是 W4P4neNjo8s

      reviews = soup.find_all('div', class_="W4P4ne")
      reviews2 = soup.find_all('div', class_="Njo8s")
      print(reviews)
      print(reviews2)
      

      之后,你可以看到内容变得funky

      <c-data id="i23" jsdata=" OKeYaf;_;9"></c-data>
      

      使用 Postman 观察相同的行为,找不到 2 个目标类。

      我建议你看看这个答案Here

      使用 Selenium 的快速示例

      from selenium import webdriver
      from selenium.webdriver.common.by import By
      from selenium.webdriver.firefox.options import Options
      
      # Config Change depending on your needs
      options = Options()
      options.binary_location = r"binary_path"
      browser = webdriver.Firefox(options=options, executable_path="driver_path")
      
      # Get the data
      url = 'https://play.google.com/store/apps/details?id=com.google.android.googlequicksearchbox&hl=en'
      browser.get(url)
      res = browser.find_elements(By.XPATH, '//div[@class="d15Mdf bAhLNe"]')
      print(res)
      

      【讨论】:

        【解决方案3】:

        实际上,url 是由 javascript 动态加载的,这就是为什么只有 bs4 不能抓取内容的原因。所以我将硒与bs4一起使用。 这是最小的工作解决方案:

        代码:

        from bs4 import BeautifulSoup
        import time
        from selenium import webdriver
        
        driver = webdriver.Chrome('chromedriver.exe')
        driver.maximize_window()
        time.sleep(8)
        
        url = 'https://play.google.com/store/apps/details?id=com.google.android.googlequicksearchbox&hl=en'
        driver.get(url)
        time.sleep(10)
        
        soup = BeautifulSoup(driver.page_source, 'lxml')
        reviews = soup.find_all('div', class_="d15Mdf bAhLNe")
        for review in reviews:
            all_reviews = review.select_one('span[jsname="bN97Pc"]').get_text()
            print('all_reviews:' + all_reviews, sep="\n",end ="\n\n")
        
        
        driver.close()
        

        输出:

        all_reviews:Since the latest update, on my Pixel2, Google search no longer works. It only shows my history - it no longer shows suggestions as I type, and when I'm finished typing and hit search, there's just a blank white screen - no error notification, no 
        progress bar, nothing. I don't want to have to go to C...Full Review
        
        all_reviews:Endlessly frustrating when I get alerts for news articles I'm interested in, but opening them from the alerts doesn't take me right to the article, and usually 
        it's either nowhere to be found, or buried several pages into the newsfeed. I get it, 
        you want me to endlessly scroll through everything. St...Full Review
        
        all_reviews:It's hard enough using a cell phone when you're my age, 66 years old. Take that and then try to learn apps and that is an obstacle in itself. Now I'm suffering 
        from a problem with Google app which I've always depended on. Recently when I go to search, the screen goes back to the home screen and a gr...Full Review
        
        all_reviews:Bug in the recent update: I can no longer make Google searches unless I use the voice search. No error message or anything, the page just doesn't load at all. I would be happy to provide screenshots and further information!
        

        【讨论】:

          猜你喜欢
          • 2021-10-01
          • 1970-01-01
          • 2019-12-12
          • 2019-03-25
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2020-10-30
          • 1970-01-01
          相关资源
          最近更新 更多