Python 替换文字美汤答案

【问题标题】：Python Replace Text beautiful SoupPython 替换文字美汤
【发布时间】：2021-03-16 19:36:31
【问题描述】：

我正在尝试用漂亮的汤替换特定的文本：我的代码：

import requests 
from bs4 import BeautifulSoup as bs

dorks = input("Keyword : ")

binglist = "http://www.bing.com/search?q="
    
with open(dorks , mode="r",encoding="utf-8") as my_file:
    for line in my_file:
        clean = binglist + line
        headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Cafari/537.36'}
        r = requests.get(clean, headers=headers)
        soup = bs(r.text, 'html.parser')
        links =  soup.find('cite')
        print(links)

输出：

[<cite>https://www.wsltv.com/tv-<strong>allinurl:-streaming</strong>/s17455</cite>, <cite>https://www.<strong>google</strong>.es/webhp</cite>]

所以我正在尝试删除所有

我尝试过这个正则表达式，但我没有成功提取网站网址

像这样：

links = soup.find_all('http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?$')

但我没有成功仅提取 URL。

感谢您的帮助

【问题讨论】：

你可以在这里找到答案：stackoverflow.com/questions/56421148/…

标签： python beautifulsoup text-extraction

【解决方案1】：

您将获得一个网络元素列表。要仅获取文本内容，请使用 .text 属性。

links = soup.find_all('cite')
for cite in links:
    print(cite.text)

【讨论】：

好的，你的代码已经修好了，能给我解释一下吗？因为我也理解
你应该学会阅读你正在使用的包的文档。
crummy.com/software/BeautifulSoup/bs4/doc/…