【发布时间】:2020-09-03 07:16:36
【问题描述】:
我正在学习如何使用 selenium 和 python ,我想爬取这个website的新闻标题和新闻日期
但我有一个问题,我不知道如何解决。
这是我的代码:
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
import time
import json
driver = webdriver.Chrome("./chromedriver")
driver.implicitly_wait(10)
driver.get("https://www.thestandnews.com/search/?q=%E6%96%B0%E5%86%A0%E8%82%BA%E7%82%8E")
soup = BeautifulSoup(driver.page_source, "lxml")
pages_remaining = True
page_num = 1
My_array = []
while pages_remaining:
print("Page Number:", page_num)
soup = BeautifulSoup(driver.page_source, "lxml")
""" #undoned
tags_lis = soup.find_all("li")
for tag in tags_lis:
tag_a = tag.find("a")
tag_span = tag.find("span")
title = tag_a.text
date = tag_span.text
temp = {"title": title , "date": date}
print(temp)
My_array.append(temp)
"""
try:
#Press button of next page
#next_link =driver.find_element_by_xpath()
nextPg = '//*[@id="___gcse_1"]/div/div/div/div[5]/div[2]/div/div/div[2]/div/div[%d]' % (page_num + 1)
print(nextPg)
next_link = driver.find_element_by_xpath(nextPg)
next_link.click()
time.sleep(5)
if page_num < 10:
page_num = page_num + 1
else:
pages_remaining = False
except Exception:
pages_remaining = False
driver.close()
这是错误信息,谁能给个提示,谢谢!
DevTools listening on ws://127.0.0.1:49952/devtools/browser/749fcb19-d13a-4f38-9d7c-3da58726e10a
[13744:13732:0517/214816.873:ERROR:browser_switcher_service.cc(238)] XXX Init()
Page Number: 1
//*[@id="___gcse_1"]/div/div/div/div[5]/div[2]/div/div/div[2]/div/div[2]
[13744:13732:0517/214824.321:ERROR:device_event_log_impl.cc(162)] [21:48:24.321] Bluetooth:
bluetooth_adapter_winrt.cc:1055 Getting Default Adapter failed.
Page Number: 2
//*[@id="___gcse_1"]/div/div/div/div[5]/div[2]/div/div/div[2]/div/div[3]
【问题讨论】:
-
我也有类似的问题,请问您找到解决方法了吗? -- 我的问题 --- 蓝牙:bluetooth_adapter_winrt.cc:713 GetBluetoothAdapterStaticsActivationFactory 失败:类未注册(0x80040154)
-
我用的是 firefox
-
您是否尝试过其他浏览器?或者 webdriver-manager 包什么会自动安装 chromedriver?