【发布时间】:2022-01-27 08:16:15
【问题描述】:
我一直在关注这个教程:https://kb.objectrocket.com/postgresql/scrape-a-website-to-postgres-with-python-938
我的 app.py 文件如下所示(取自上述教程):
from flask import Flask # needed for flask-dependent libraries below
from flask import render_template # to render the error page
from selenium import webdriver # to grab source from URL
from bs4 import BeautifulSoup # for searching through HTML
import psycopg2 # for database access
# set up Postgres database connection and cursor.
t_host = "localhost" # either "localhost", a domain name, or an IP address.
t_port = "5432" # default postgres port
t_dbname = "scrape"
t_user = "postgres"
t_pw = "********"
db_conn = psycopg2.connect(host=t_host, port=t_port, dbname=t_dbname, user=t_user, password=t_pw)
db_cursor = db_conn.cursor()
app = Flask(__name__)
@app.route("/")
@app.route('/import_temp')
def import_temp():
# set up your webdriver to use Chrome web browser
my_web_driver = webdriver.Chrome("/usr/local/bin/chromedriver")
# designate the URL we want to scrape
# NOTE: the long string of characters at the end of this URL below is a clue that
# maybe this page is so dynamic, like maybe refers to a specific web session and/or day/time,
# that we can't necessarily count on it to be the same more than one time.
# Which means... we may want to find another source for our data; one that is more
# dependable. That said, whatever URL you use, the methodology in this lesson stands.
t_url = "https://weather.com/weather/today/l/7ebb344012f0c5ff88820d763da89ed94306a86c770fda50c983bf01a0f55c0d"
# initiate scrape of website page data
my_web_driver.get("<a href='" + t_url + "'>" + t_url + "</a>")
# return entire page into "t_content"
t_content = my_web_driver.page_source
# use soup to make page content easily searchable
soup_in_bowl = BeautifulSoup(t_content)
# search for the UNIQUE span and class for the data we are looking for:
o_temp = soup_in_bowl.find('span', attrs={'class': 'deg-feels'})
# from the resulting object, "o_temp", get the text parameter and assign it to "n_temp"
n_temp = o_temp.text
# Build SQL for purpose of:
# saving the temperature data to a new row
s = ""
s += "INSERT INTO tbl_temperatures"
s += "("
s += "n_temp"
s += ") VALUES ("
s += "(%n_temp)"
s += ")"
# Trap errors for opening the file
try:
db_cursor.execute(s, [n_temp, n_temp])
db_conn.commit()
except psycopg2.Error as e:
t_msg = "Database error: " + e + "/n open() SQL: " + s
return render_template("error_page.html", t_msg = t_msg)
# Success!
# Show a message to user.
t_msg = "Successful scrape!"
return render_template("progress.html", t_msg = t_msg)
# Clean up the cursor and connection objects
db_cursor.close()
db_conn.close()
从 Python Shell 错误日志看来,该 URL 无效:
FLASK_APP = app.py
FLASK_ENV = development
FLASK_DEBUG = 0
In folder /home/lloyd/PycharmProjects/flaskProject
/home/lloyd/PycharmProjects/flaskProject/venv/bin/python -m flask run
* Serving Flask app 'app.py' (lazy loading)
* Environment: development
* Debug mode: off
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
[2021-12-28 18:13:05,988] ERROR in app: Exception on / [GET]
Traceback (most recent call last):
File "/home/lloyd/PycharmProjects/flaskProject/venv/lib/python3.6/site-packages/flask/app.py", line 2073, in wsgi_app
response = self.full_dispatch_request()
File "/home/lloyd/PycharmProjects/flaskProject/venv/lib/python3.6/site-packages/flask/app.py", line 1518, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/lloyd/PycharmProjects/flaskProject/venv/lib/python3.6/site-packages/flask/app.py", line 1516, in full_dispatch_request
rv = self.dispatch_request()
File "/home/lloyd/PycharmProjects/flaskProject/venv/lib/python3.6/site-packages/flask/app.py", line 1502, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
File "/home/lloyd/PycharmProjects/flaskProject/app.py", line 33, in import_temp
my_web_driver.get("<a href='" + t_url + "'>" + t_url + "</a>")
File "/home/lloyd/PycharmProjects/flaskProject/venv/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
self.execute(Command.GET, {'url': url})
File "/home/lloyd/PycharmProjects/flaskProject/venv/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/home/lloyd/PycharmProjects/flaskProject/venv/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: unhandled inspector error: {"code":-32000,"message":"Cannot navigate to invalid URL"}
(Session info: chrome=96.0.4664.110)
(Driver info: chromedriver=2.35.528139 (47ead77cb35ad2a9a83248b292151462a66cd881),platform=Linux 4.18.0-259.el8.x86_64 x86_64)
127.0.0.1 - - [28/Dec/2021 18:13:05] "GET / HTTP/1.1" 500 -
但是,当我手动输入地址时,我可以访问该 URL。
当我运行应用程序时,会出现一个带有错误消息的 Web 控制台:
Internal Server Error
The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.
然后会出现第二个 Web 控制台,并在 Web 浏览器的地址栏中显示文本 data:,。
任何关于为什么会发生这种情况的见解将不胜感激。我之前确实在这里发布了一个类似的问题,即 404 错误:Why am I receiving ERROR 404 - when attempting to use Python Flask
【问题讨论】:
标签: python selenium selenium-chromedriver