【发布时间】:2020-05-30 11:08:54
【问题描述】:
我正在尝试从网页中提取所有视频链接引用以及视频名称,我尝试了以下代码。
#!/usr/bin/python3
from bs4 import BeautifulSoup
import requests
import urllib
url = urllib.request.urlopen('https://www.ansible.com/resources/videos').read()
acc_link = BeautifulSoup(url, features="lxml")
for line in acc_link.find_all('a'):
print(line.get('href'))
输出:
https://www.ansible.com/?hsLang=en-us
https://www.ansible.com/overview/it-automation?hsLang=en-us
https://www.ansible.com/overview/it-automation?hsLang=en-us
https://www.ansible.com/overview/how-ansible-works?hsLang=en-us
https://www.ansible.com/products/automation-platform?hsLang=en-us
https://www.ansible.com/use-cases?hsLang=en-us
https://www.ansible.com/use-cases/provisioning?hsLang=en-us
https://www.ansible.com/use-cases/configuration-management?hsLang=en-us
https://www.ansible.com/use-cases/application-deployment?hsLang=en-us
https://www.ansible.com/use-cases/continuous-delivery?hsLang=en-us
https://www.ansible.com/use-cases/security-automation?hsLang=en-us
https://www.ansible.com/use-cases/orchestration?hsLang=en-us
https://www.ansible.com/integrations?hsLang=en-us
HTML源代码示例:
<h4><a href="https://www.ansible.com/resources/webinars-training/ansible-network-automation-with-arista-cloudvision-and-arista?hsLang=en-us">Ansible Network Automation with Arista CloudVision and Arista Validated Designs</a></h4>
如上只是链接https://www.ansible.com/resources/videos的HTML源代码的示例,我希望链接名称为https://www.ansible.com/resources/webinars-training/ansible-network-automation-with-arista-cloudvision-and-arista和视频名称Ansible Network Automation with Arista CloudVision and Arista Validated Designs。
下面只是另一个示例,我希望 href 在 ? 和 a 值之前,即 Scale-out Clustering with Tower 3.1。
<h4><a href="https://www.ansible.com/scale-out-clustering-tower?hsLang=en-us">Scale-out Clustering with Tower 3.1</a></h4>
期望的输出:
视频名称:使用 Arista CloudVision 和 Arista 验证设计实现 Ansible 网络自动化
感谢您的帮助。
【问题讨论】:
标签: python html python-3.x pandas web-scripting