PYTHON：ValueError：未知的网址类型：'comments_42.html'答案

【问题标题】：PYTHON: ValueError: unknown url type: 'comments_42.html'PYTHON：ValueError：未知的网址类型：'comments_42.html'
【发布时间】：2020-10-29 18:50:27
【问题描述】：

好的，我正在上 Pyhton 的课程，作业要求我们从 html 文档中检索数据。这是我想出的：

from urllib.request import urlopen
from bs4 import BeautifulSoup
import ssl


ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

intlist = list()
tot = 0
count = 0
url = input('Enter - ')
html = urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, "html.parser")
tags = soup('span')
for tag in tags:
    n = tag.contents[0]
    n = int(n)
    count += 1
    tot = tot + n
print("Count:", n)
print("Total:", tot)

当我尝试访问文件时会发生这种情况（注意：我尝试检索的文件存储在本地）：

这个错误的原因是什么？感谢任何人的帮助。

【问题讨论】：

你的 html 存储在本地文件系统中吗？
那么，协议呢？ http、https、ftp、ftps - 仅举几例？否则只需通过with open(my_file) as file: 打开文件
这可能会有所帮助：stackoverflow.com/a/20558624/11213106
@bigbounty 这就是任务要求你做的事情
@Jan 试过了，没用

标签： python urllib urlopen

【解决方案1】：

您应该将 html 直接读入BeautifulSoup。使用urlopen 无法轻松打开本地文件。

from bs4 import BeautifulSoup

...

with open('filename.html', 'r') as htmlfile:
    html = htmlfile.read()
soup = BeautifulSoup(html, 'html.parser')

现在它已加载供您解析，不要忘记将filename.html更改为您的实际文件路径

编辑：您的代码还有很多问题。 soup('span') 找不到 span 元素。请参考docs至少有一个基本的了解。

【讨论】：