【问题标题】:not able to iterate through multiple pages while scraping data抓取数据时无法遍历多个页面
【发布时间】:2021-07-03 08:14:52
【问题描述】:

所以,我必须在 Flipkart 上搜索该产品的评论和评分。 我需要抓取至少 30-40 条评论和评分。因此,我必须点击下一页,因为在第一页上只有 10 条评论存在。下面是我用来检查我的代码是否能够点击下一页的代码。

'''

driver =webdriver.Chrome(r"chromedriver.exe")

'''

driver.get('https://www.flipkart.com/hp-15s-ryzen-3-dual-core-3250u-8-gb-1-tb-hdd-256-gb-ssd-windows-10-home-15s-gr0012au-laptop/product-reviews/itm9e1f8deeed35f?pid=COMFZHFWBE7APPH2&lid=LSTCOMFZHFWBE7APPH2AR705G&marketplace=FLIPKART&page=2)

'''

for page in range(4):
   
   try:
       next_butt = driver.find_element_by_xpath("//nav[@class='yFHi8N']/a/span")

       if next_butt.text == 'NEXT':
           next_butt.click()
   except NoSuchElementException:
       continue
time.sleep(1)

当我运行此代码时,我观察到它能够单击下一个按钮,但在第一次迭代后它会单击上一个按钮,所以我没有前进。

请帮忙。

【问题讨论】:

    标签: selenium loops for-loop web-scraping iteration


    【解决方案1】:

    看看你分享的这个网址:

    https://www.flipkart.com/hp-15s-ryzen-3-dual-core-3250u-8-gb-1-tb-hdd-256-gb-ssd-windows-10-home-15s-gr0012au-laptop/product-reviews/itm9e1f8deeed35f?pid=COMFZHFWBE7APPH2&lid=LSTCOMFZHFWBE7APPH2AR705G&marketplace=FLIPKART&page=2
    

    最后你会看到page = 2,所以如果我把它改成page = 3,我会看到第三页评论,而没有Selenium bot点击Next button.

    所以我在这里要做的是解析page_number int 变量,如下所示:

    示例代码:

    driver.maximize_window()
    page_number = 1
    for page in range(4):
        driver.get("https://www.flipkart.com/hp-15s-ryzen-3-dual-core-3250u-8-gb-1-tb-hdd-256-gb-ssd-windows-10-home-15s-gr0012au-laptop/product-reviews/itm9e1f8deeed35f?pid=COMFZHFWBE7APPH2&lid=LSTCOMFZHFWBE7APPH2AR705G&marketplace=FLIPKART&page=%s" % page_number)
        #scrape anything you want here
        page_number = page_number + 1
        sleep(5)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-09-14
      • 2019-10-14
      • 1970-01-01
      • 2016-11-19
      • 1970-01-01
      • 2021-02-22
      相关资源
      最近更新 更多