用于从视图源获取所有链接的python代码，包括所有标签答案

【问题标题】：python code for getting all the links from view source including all the tags用于从视图源获取所有链接的python代码，包括所有标签
【发布时间】：2016-02-19 05:09:05
【问题描述】：

有人可以指导我，如何使用 python 获取视图页面源中可见的所有链接。我想从所有标签中检索所有链接（如链接、a、img、css...一切）。下面是我尝试过的代码。

import requests
from bs4 import BeautifulSoup
r=requests.get(url)
soup = BeautifulSoup(r.content)
soup.prettify()
for anchor in soup.find_all('a',href=True):
    print anchor['href']
for anchor in soup.find_all('link',href=True):
    print anchor['href']
for anchor in soup.find_all('img',src=True):
    print anchor['src']
for anchor in soup.find_all('script',src=Treu):
    print anchor['src']

像这样我能够从所有标签中获取链接，但无法从样式表中获取链接。例如 .bg {.bg {背景：网址（XXXX）}。

【问题讨论】：

BeautifulSoup 帮不了你。你考虑过正则表达式吗？还有一个警告说明：如果 javascript 或 css 嵌入在 HTML 中（而不是在外部文件中），并且它们引用了其他 url，那么您现在使用当前的方法也会丢失它。

标签： python beautifulsoup python-requests

【解决方案1】：

BeautifulSoup 无法解析 JS 代码和 CSS 代码。但是你可以使用 RegExp 来完成这个任务。

另外，如果您有很多相同的代码，请使用列表、数组和字典：

for anchor in soup.find_all('a',href=True):
    print anchor['href']
for anchor in soup.find_all('link',href=True):
    print anchor['href']

到：

for tag in ['a', 'link']:
    for anchor in soup.find_all(tag, href=True):
        print anchor['href']

然后就可以轻松改代码了

【讨论】：

【解决方案2】：

这个re.findall('url\(([^)]+)\)',target_text)的解决方案

【讨论】：

你能解释一下cmd吗？