【问题标题】:Why am I receiving an Invalid URL Error, when using Python Flask ChromeDriver and Chrome为什么我在使用 Python Flask ChromeDriver 和 Chrome 时收到无效的 URL 错误
【发布时间】:2022-01-27 08:16:15
【问题描述】:

我一直在关注这个教程:https://kb.objectrocket.com/postgresql/scrape-a-website-to-postgres-with-python-938

我的 app.py 文件如下所示(取自上述教程):

from flask import Flask  # needed for flask-dependent libraries below
from flask import render_template  # to render the error page
from selenium import webdriver  # to grab source from URL
from bs4 import BeautifulSoup  # for searching through HTML
import psycopg2  # for database access

# set up Postgres database connection and cursor.
t_host = "localhost" # either "localhost", a domain name, or an IP address.
t_port = "5432" # default postgres port
t_dbname = "scrape"
t_user = "postgres"
t_pw = "********"
db_conn = psycopg2.connect(host=t_host, port=t_port, dbname=t_dbname, user=t_user, password=t_pw)
db_cursor = db_conn.cursor()

app = Flask(__name__)


@app.route("/")
@app.route('/import_temp')
def import_temp():
    # set up your webdriver to use Chrome web browser
    my_web_driver = webdriver.Chrome("/usr/local/bin/chromedriver")

    # designate the URL we want to scrape
    #   NOTE: the long string of characters at the end of this URL below is a clue that
    #   maybe this page is so dynamic, like maybe refers to a specific web session and/or day/time,
    #   that we can't necessarily count on it to be the same more than one time.
    #   Which means... we may want to find another source for our data; one that is more
    #   dependable. That said, whatever URL you use, the methodology in this lesson stands.
    t_url = "https://weather.com/weather/today/l/7ebb344012f0c5ff88820d763da89ed94306a86c770fda50c983bf01a0f55c0d"
    # initiate scrape of website page data
    my_web_driver.get("<a href='" + t_url + "'>" + t_url + "</a>")
    # return entire page into "t_content"
    t_content = my_web_driver.page_source
    # use soup to make page content easily searchable
    soup_in_bowl = BeautifulSoup(t_content)
    # search for the UNIQUE span and class for the data we are looking for:
    o_temp = soup_in_bowl.find('span', attrs={'class': 'deg-feels'})
    # from the resulting object, "o_temp", get the text parameter and assign it to "n_temp"
    n_temp = o_temp.text

    # Build SQL for purpose of:
    #    saving the temperature data to a new row
    s = ""
    s += "INSERT INTO tbl_temperatures"
    s += "("
    s += "n_temp"
    s += ") VALUES ("
    s += "(%n_temp)"
    s += ")"

    # Trap errors for opening the file
    try:
        db_cursor.execute(s, [n_temp, n_temp])
        db_conn.commit()
    except psycopg2.Error as e:
        t_msg = "Database error: " + e + "/n open() SQL: " + s
        return render_template("error_page.html", t_msg = t_msg)

    # Success!
    # Show a message to user.
    t_msg = "Successful scrape!"
    return render_template("progress.html", t_msg = t_msg)

    # Clean up the cursor and connection objects
    db_cursor.close()
    db_conn.close()

从 Python Shell 错误日志看来,该 URL 无效:

FLASK_APP = app.py
FLASK_ENV = development
FLASK_DEBUG = 0
In folder /home/lloyd/PycharmProjects/flaskProject
/home/lloyd/PycharmProjects/flaskProject/venv/bin/python -m flask run
 * Serving Flask app 'app.py' (lazy loading)
 * Environment: development
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
[2021-12-28 18:13:05,988] ERROR in app: Exception on / [GET]
Traceback (most recent call last):
  File "/home/lloyd/PycharmProjects/flaskProject/venv/lib/python3.6/site-packages/flask/app.py", line 2073, in wsgi_app
    response = self.full_dispatch_request()
  File "/home/lloyd/PycharmProjects/flaskProject/venv/lib/python3.6/site-packages/flask/app.py", line 1518, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/home/lloyd/PycharmProjects/flaskProject/venv/lib/python3.6/site-packages/flask/app.py", line 1516, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/lloyd/PycharmProjects/flaskProject/venv/lib/python3.6/site-packages/flask/app.py", line 1502, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "/home/lloyd/PycharmProjects/flaskProject/app.py", line 33, in import_temp
    my_web_driver.get("<a href='" + t_url + "'>" + t_url + "</a>")
  File "/home/lloyd/PycharmProjects/flaskProject/venv/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
    self.execute(Command.GET, {'url': url})
  File "/home/lloyd/PycharmProjects/flaskProject/venv/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/home/lloyd/PycharmProjects/flaskProject/venv/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: unhandled inspector error: {"code":-32000,"message":"Cannot navigate to invalid URL"}
  (Session info: chrome=96.0.4664.110)
  (Driver info: chromedriver=2.35.528139 (47ead77cb35ad2a9a83248b292151462a66cd881),platform=Linux 4.18.0-259.el8.x86_64 x86_64)

127.0.0.1 - - [28/Dec/2021 18:13:05] "GET / HTTP/1.1" 500 -

但是,当我手动输入地址时,我可以访问该 URL。

当我运行应用程序时,会出现一个带有错误消息的 Web 控制台:

Internal Server Error
The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.

然后会出现第二个 Web 控制台,并在 Web 浏览器的地址栏中显示文本 data:,

任何关于为什么会发生这种情况的见解将不胜感激。我之前确实在这里发布了一个类似的问题,即 404 错误:Why am I receiving ERROR 404 - when attempting to use Python Flask

【问题讨论】:

    标签: python selenium selenium-chromedriver


    【解决方案1】:

    所有这些错误:

    • [2021-12-28 18:13:05,988] ERROR in app: Exception on / [GET]
    • selenium.common.exceptions.WebDriverException:消息:未知错误:未处理的检查器错误:{"code":-32000,"message":"Cannot navigate to invalid URL"}
    • Web 控制台显示文本 data:,

    是由于您使用的二进制文件版本之间存在不兼容

    • 您正在使用 chrome=96.0.4664.45
    • ChromeDriver v96.0 的发行说明明确提及以下内容:

    支持 Chrome 版本 96

    支持 Chrome v62-64

    所以 chromedriver=91.0chrome=2.35.528139

    之间存在明显的不匹配

    解决方案

    确保:

    【讨论】:

    • 感谢@DebanjanB 的回复-如果可能的话,您能告诉我如何更新 ChromeDriver 吗?我运气不太好。
    • @LloydThomas 检查嵌入的链接。一切尽在您的指尖:)
    猜你喜欢
    • 2022-01-26
    • 2023-02-08
    • 2012-06-16
    • 1970-01-01
    • 2017-11-30
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多