如何使用 BeautifulSoup 检索 tarfile 的 URL答案

【问题标题】：How to use BeautifulSoup to retrieve the URL of a tarfile如何使用 BeautifulSoup 检索 tarfile 的 URL
【发布时间】：2021-07-02 14:31:42
【问题描述】：

以下网页包含 LFS 项目的所有源代码 URL：

https://linuxfromscratch.org/lfs/view/systemd/chapter03/packages.html

我编写了一些 python3 代码来从该页面检索所有这些 URL：


#!/usr/bin/env python3

from requests import get
from bs4 import BeautifulSoup
import re
import sys, os

#url=sys.argv[1]
url="https://linuxfromscratch.org/lfs/view/systemd/chapter03/packages.html"
exts = (".xz", ".bz2", ".gz", ".lzma", ".tgz", ".zip")

response = get(url)
soup = BeautifulSoup(response.content, 'html.parser')

for link in soup.find_all('a', href=True):
  if link.get('href'):
      for anhref in link.get('href').split():
          if os.path.splitext(anhref)[-1] in exts:
              print((link.get('href')))

我想做的是输入一个模式，比如：

模式 = 'iproute2'

然后打印包含 iproute2 tarfile 的行

恰好是：

https://www.kernel.org/pub/linux/utils/net/iproute2/iproute2-5.12.0.tar.xz

我尝试使用 match = re.search(pattern, text) 并找到正确的行，但如果我打印 match 我得到：

如何让它打印实际的 URL？

【问题讨论】：

标签： python-3.x regex beautifulsoup

【解决方案1】：

你可以use .string property（返回传递给函数的字符串）。

代码示例

txt="https://www.kernel.org/pub/linux/utils/net/iproute2/iproute2-5.12.0.tar.xz"

pattern = 'iproute2' 

match = re.search(pattern, txt) 

if match:   # this condition is used to avoid NoneType error
    print(match.string)

else:
    print('No Match Found')

【讨论】：

太棒了，正是医生要求的！
很高兴你喜欢它！您介意为答案投票吗？