【发布时间】:2021-07-02 14:31:42
【问题描述】:
以下网页包含 LFS 项目的所有源代码 URL:
https://linuxfromscratch.org/lfs/view/systemd/chapter03/packages.html
我编写了一些 python3 代码来从该页面检索所有这些 URL:
#!/usr/bin/env python3
from requests import get
from bs4 import BeautifulSoup
import re
import sys, os
#url=sys.argv[1]
url="https://linuxfromscratch.org/lfs/view/systemd/chapter03/packages.html"
exts = (".xz", ".bz2", ".gz", ".lzma", ".tgz", ".zip")
response = get(url)
soup = BeautifulSoup(response.content, 'html.parser')
for link in soup.find_all('a', href=True):
if link.get('href'):
for anhref in link.get('href').split():
if os.path.splitext(anhref)[-1] in exts:
print((link.get('href')))
我想做的是输入一个模式,比如:
模式 = 'iproute2'
然后打印包含 iproute2 tarfile 的行
恰好是:
https://www.kernel.org/pub/linux/utils/net/iproute2/iproute2-5.12.0.tar.xz
我尝试使用 match = re.search(pattern, text) 并找到正确的行,但如果我打印 match 我得到:
如何让它打印实际的 URL?
【问题讨论】:
标签: python-3.x regex beautifulsoup