【发布时间】:2022-01-12 18:13:51
【问题描述】:
我正在尝试使用 selenium 从网站上抓取数据。该应用程序在 aws ec2 实例上的 flask + uwsgi + nginx 上运行。
代码
from selenium import webdriver
from selenium.common.exceptions import WebDriverException
from selenium.webdriver.chrome.service import Service
service = Service("/opt/chromedriver")
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.3"
}
options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')
options.add_argument(
"user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.3")
options.add_argument("--headless")
options.add_argument("--ignore-certificate-errors")
options.add_argument("--enable-javascript")
options.add_argument("--incognito")
options.add_argument("--disable-dev-shm-usage")
def scrape_data(URL):
try:
driver = webdriver.Chrome(service=service, options=options)
driver.get(URL)
driver.implicitly_wait(2)
html_content = driver.page_source
driver.quit()
except WebDriverException:
driver.quit()
print("Failed URL -->", URL)
return html_content
url_x = input("Enter url : ")
raw_text = scrape_data(url_x)
通过 nginx uwsgi 访问时执行此操作时出现错误,但如果我在 CLI 中执行代码则没有错误:
Traceback (most recent call last):
File "/home/ubuntu/ml_eval/./main.py", line 98, in scrape_data
uwsgi[45145]: File "/home/ubuntu/ml_eval/prjenv/lib/python3.8/site-packages/selenium/webdriver/chrome/webdriver.py", line 70, in __init__
uwsgi[45145]: driver = webdriver.Chrome(service=service, options=options)
uwsgi[45145]: super(WebDriver, self).__init__(DesiredCapabilities.CHROME['browserName'], "goog",
uwsgi[45145]: File "/home/ubuntu/ml_eval/prjenv/lib/python3.8/site-packages/selenium/webdriver/chromium/webdriver.py", line 93, in __init__
uwsgi[45145]: RemoteWebDriver.__init__(
uwsgi[45145]: File "/home/ubuntu/ml_eval/prjenv/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 268, in __init__
uwsgi[45145]: self.start_session(capabilities, browser_profile)
uwsgi[45145]: File "/home/ubuntu/ml_eval/prjenv/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 359, in start_session
uwsgi[45145]: response = self.execute(Command.NEW_SESSION, parameters)
uwsgi[45145]: File "/home/ubuntu/ml_eval/prjenv/lib/python3.8/site-packages/selenium/webdriver/remote/webdriver.py", line 424, in execute
uwsgi[45145]: self.error_handler.check_response(response)
uwsgi[45145]: File "/home/ubuntu/ml_eval/prjenv/lib/python3.8/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
uwsgi[45145]: raise exception_class(message, screen, stacktrace)
uwsgi[45145]: selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally.
uwsgi[45145]: (unknown error: DevToolsActivePort file doesn't exist)
uwsgi[45145]: (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
uwsgi[45145]: Stacktrace:
uwsgi[45145]: #0 0x55765c077ee3 <unknown>
如何解决?
**操作系统:ubuntu 20.04
Python 3.8.10
ChromeDriver 96.0.4664.45 (76e4c1bb2ab4671b8beba3444e61c0f17584b2fc-refs/branch-heads/4664@{#947})
谷歌浏览器 96.0.4664.93
谷歌浏览器位置:/usr/bin/google-chrome**
【问题讨论】:
标签: python selenium google-chrome selenium-webdriver selenium-chromedriver