查找网页上存在的参考链接的数量答案

【问题标题】：Find the number of reference links present on a webpage查找网页上存在的参考链接的数量
【发布时间】：2019-06-29 12:53:57
【问题描述】：

我有以下问题要回答。我正在遵循所有这些步骤，但得到的答案是 1568 或 1572。但显然这两个答案都不正确。有人可以帮我理解我在这里做错了什么。

从链接“https://en.wikipedia.org/wiki/Python_(programming_language)”中读取 html 内容。将内容存储在变量 html_content 中。

使用 html_content 和 html.parser 创建一个 BeautifulSoup 对象。将结果存储在变量汤中。

找出汤对象中存在的参考链接的数量。将结果存储在变量 n_links 中。

提示：利用 find_all 方法和标签。

打印 n_links。

【问题讨论】：

你试过什么？什么不工作？你能说得更具体点吗？
另外，预期的答案号码是多少？
反复说这是错误的答案，却没有告诉我们预期的正确答案是什么，这令人沮丧。您是否有理由不与我们分享这些信息？

标签： python-3.x web-scraping nlp

【解决方案1】：

这里可能发生了语义上的事情。不确定，因为您没有指定答案的实际目标数。如果所需的链接来自 references 部分，那么您需要限制为带有父类的 html 部分。在这种情况下，我会使用通过 select 应用的 css 选择器。这给出了391。

from bs4 import BeautifulSoup as bs
import requests

html_content = requests.get('https://en.wikipedia.org/wiki/Python_(programming_language)#References').content
soup = bs(html_content, 'html.parser')
n_links = [item['href'] for item in soup.select('.reflist a')]
print(len(n_links))

【讨论】：

感谢您提供不同的视角。但是这个答案效果不佳。
答案的预期数字是多少？

【解决方案2】：

from urllib import request
import re

url = "https://en.wikipedia.org/wiki/Python_(programming_language)"
html_content = request.urlopen(url).read()

from bs4 import BeautifulSoup
soup = BeautifulSoup(html_content, 'html.parser')

n_links = []

for link in soup.findAll('a', attrs={'href': re.compile("^http://")}):
    n_links.append(link.get('href'))

print(n_links)

【讨论】：

你能提供解释和你的代码吗？