使用 python 3 创建一个脚本以捕获网页上的链接答案

【问题标题】：Create a script to catch links on a webpage with python 3使用 python 3 创建一个脚本以捕获网页上的链接
【发布时间】：2016-11-21 05:48:47
【问题描述】：

我要抓到本页所有话题的链接：https://www.inforge.net/xi/forums/liste-proxy.1118/

我试过这个脚本：

import urllib.request
from bs4 import BeautifulSoup

url = (urllib.request.urlopen("https://www.inforge.net/xi/forums/liste-proxy.1118/"))
soup = BeautifulSoup(url, "lxml")

for link in soup.find_all('a'):
    print(link.get('href'))

但它会打印页面的所有链接，而不仅仅是我想要的主题链接。你能建议我快速的方法吗？我还是个新手，最近开始学习python。

【问题讨论】：

标签： python python-3.x hyperlink try-catch webpage

【解决方案1】：

您可以使用 BeautifulSoup 来解析 HTML：

from bs4 import BeautifulSoup
from urllib2 import urlopen

url= 'https://www.inforge.net/xi/forums/liste-proxy.1118/'
soup= BeautifulSoup(urlopen(url))

然后找到与

的链接

soup.find_all('a', {'class':'PreviewTooltip'})

【讨论】：

感谢您的回答，但如果我按照您的方法进行打印（汤），它会给我页面的来源，而不是主题的链接：\
这会给你标签对象。要将 url 作为字符串获取，请使用 [tag.get('href') for tag in soup.find_all('a', {'class':'PreviewTooltip'})
好的。现在我得到了我想要的链接，但它们在 html 代码中。 <a class="PreviewTooltip" data-previewurl="threads/dichvusocks-us-23h10-pm-update-24-24-good-socks.455661/preview" href="threads/dichvusocks-us-23h10-pm-update-24-24-good-socks.455661/" title="">[DICHVUSOCKS.US] 23h10 PM UPDATE 24/24- Good Socks</a> 但这是向前迈出的一大步！ :)
伙计，多亏了你，我解决了。这是最终代码：from bs4 import BeautifulSoup import urllib.request url= 'https://www.inforge.net/xi/forums/liste-proxy.1118/' soup= BeautifulSoup(urllib.request.urlopen(url), "lxml") for tag in soup.find_all('a', {'class':'PreviewTooltip'}): print(tag.get('href')) 再次感谢