如何使用静态网址抓取多个页面，请求方法获取答案

【问题标题】：how to scrape multiple pages with static url, request method get如何使用静态网址抓取多个页面，请求方法获取
【发布时间】：2020-07-21 15:08:30
【问题描述】：

首先，对不起我的英语，其次 iam 在 python 中只有 2 周大。

现在我使用 python、模块 selenium 和 chromedriver，我要抓取的页面是“http://lpse.maroskab.go.id/eproc4/lelang”，我使用的代码是这样的：

from time import sleep
from selenium import webdriver
from bs4 import BeautifulSoup as bs
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("disable-extensions")
chrome_options.add_argument("disable-gpu")
chrome_options.add_argument("headless")

path =r'F:\python latian\webdriver\chromedriver.exe'

driver = webdriver.Chrome(options=chrome_options, executable_path = path)
driver.get('http://lpse.maroskab.go.id/eproc4/lelang')
sleep(5)
page=bs(driver.page_source,"html.parser")
code=page.find_all(class_="sorting_1")
for xx in code:
   kode=xx.contents[0]
   print(code)

但是使用此代码我只能从第一页获取数据，而我想要完成的是抓取另一页，然后我遇到了（[this thread][1]），但该线程中的答案请求方法是“post”，而我的答案是“get”。我在那里读到了使用“urllib.request”的建议，但据我所知，这种方法只有在我知道 url 时才有效。谢谢 [1]：https://stackoverflow.com/questions/48985758/how-to-scrape-multiple-pages-with-an-unchang-url-python-3

【问题讨论】：

请告诉我你的进展
我找到了解决方法，我想使用 urllib，因为在我在 XHR 中找到的 url 中，我找到了一些参数，如果我更改值，我可以获取数据，但不幸的是，由于某种原因，我得到了错误 403（也许他们最初阻止了 urllib）
可以使用urllib。您可以将其作为另一个问题发布，我很乐意为您提供帮助。
@AzyCrw4282，我在这里发布了另一个问题：stackoverflow.com/questions/61179774/…

标签： javascript python selenium screen-scraping

【解决方案1】：

有很多方法可以解决这个问题，并且迭代多个页面并非易事，您的代码需要大量改进。由于您是新人，我将您需要包含的内容并举一个示例，您可以使用它来合并到您的代码中。

您肯定需要使用Explicit Waits 来等待“加载”指标不可见。

您还需要一个无限循环，只有在“下一页”链接被禁用（没有更多可用页面）时我们才会退出。

This 是一个很好的例子，并使用@alecxe 的答案。

【讨论】：

好吧，我还是很困惑，这超出了我的知识范围，我还是很笨，现在我会学习更多，并删除这个帖子，非常感谢先生/小姐