【发布时间】:2020-04-02 16:37:42
【问题描述】:
我需要对页面进行分页并将每个页面的 HTML 保存在一个列表中。
HTML 看起来像这样,对于第一页 class="sc-4j28w0-1 fDeSdf" 的第一个元素是箭头 '>'
<li disabled="" class="sc-4j28w0-1 fDeSdf"></li>
<li data-testid="current-page-item" class="sc-4j28w0-1 sc-4j28w0-2 jDlZyl">1</li>
<li class="sc-4j28w0-1 lhEbhI"><span class="sc-4j28w0-3 jAKnhT">2</span></li>
<li class="sc-4j28w0-1 lhEbhI"><span class="sc-4j28w0-3 jAKnhT">3</span></li>
<li class="sc-4j28w0-1 lhEbhI"></li>
第二页和附加页(不是最后一页)
<li class="sc-4j28w0-1 lhEbhI"></li>
<li class="sc-4j28w0-1 lhEbhI"><span class="sc-4j28w0-3 jAKnhT">1</span></li>
<li data-testid="current-page-item" class="sc-4j28w0-1 sc-4j28w0-2 jDlZyl">2</li>
<li class="sc-4j28w0-1 lhEbhI"><span class="sc-4j28w0-3 jAKnhT">3</span></li>
<li class="sc-4j28w0-1 lhEbhI"></li>
对于最后一页 class="sc-4j28w0-1 fDeSdf" 的最后一个元素是箭头 '
<li class="sc-4j28w0-1 lhEbhI"></li>
<li class="sc-4j28w0-1 lhEbhI"><span class="sc-4j28w0-3 jAKnhT">1</span></li>
<li class="sc-4j28w0-1 lhEbhI"><span class="sc-4j28w0-3 jAKnhT">2</span></li>
<li data-testid="current-page-item" class="sc-4j28w0-1 sc-4j28w0-2 jDlZyl">3</li>
<li disabled="" class="sc-4j28w0-1 fDeSdf"></li>
所以如果页面的第一个或最后一个类是 'sc-4j28w0-1 fDeSdf'
我尝试使用 while 循环进行分页
# list for html pages
news_list = []
while True:
wait = WebDriverWait(driver, 10)
# by clicking on the last element of pagination == >
search = wait.until(EC.presence_of_element_located((By.XPATH, '/html/body/div/div/div[2]/div[2]/div/ol/li[5]')))
# if it is active click
if search.is_enabled():
search.click()
time.sleep(5)
html = driver.page_source
soup_news = BeautifulSoup(html)
news_list.append(soup_news)
else:
pass
但是循环不停的问题,一直保存最后一页
我也试过这样:
wait = WebDriverWait(driver, 10)
search = wait.until(EC.element_to_be_clickable((By.XPATH, '/html/body/div/div/div[2]/div[2]/div/ol/li[5]')))
while search.get_property('disabled') is False:
search.click()
time.sleep(5)
html = driver.page_source
soup_news = BeautifulSoup(html)
news_list.append(soup_news)
然后我得到错误
---------------------------------------------------------------------------
StaleElementReferenceException Traceback (most recent call last)
<ipython-input-51-49e862d6475f> in <module>
34
35
---> 36 while search.is_enabled():
37 try:
38 search.click()
~\AppData\Local\Continuum\anaconda3\lib\site-packages\selenium\webdriver\remote\webelement.py in is_enabled(self)
157 def is_enabled(self):
158 """Returns whether the element is enabled."""
--> 159 return self._execute(Command.IS_ELEMENT_ENABLED)['value']
160
161 def find_element_by_id(self, id_):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\selenium\webdriver\remote\webelement.py in _execute(self, command, params)
631 params = {}
632 params['id'] = self._id
--> 633 return self._parent.execute(command, params)
634
635 def find_element(self, by=By.ID, value=None):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py in execute(self, driver_command, params)
319 response = self.command_executor.execute(driver_command, params)
320 if response:
--> 321 self.error_handler.check_response(response)
322 response['value'] = self._unwrap_value(
323 response.get('value', None))
~\AppData\Local\Continuum\anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py in check_response(self, response)
240 alert_text = value['alert'].get('text')
241 raise exception_class(message, screen, stacktrace, alert_text)
--> 242 raise exception_class(message, screen, stacktrace)
243
244 def _value_or_default(self, obj, key, default):
StaleElementReferenceException: Message: The element reference of <li class="sc-4j28w0-1 lhEbhI"> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed
感谢任何帮助
【问题讨论】:
-
您的意思是
else: break而不是else: pass? -
两个都试过了,不行
标签: python selenium pagination