【问题标题】:How to replace a domain in image src in python?如何在 python 中替换图像 src 中的域?
【发布时间】:2021-09-18 21:43:56
【问题描述】:

我有以下 HTML 字符串

<body>
    <img
        alt="Images may be two-dimensional, such as a photograph or screen display, or three-dimensional, such as a statue or hologram. They may be captured by optical devices – such as cameras, mirrors, lenses, telescopes, microscopes, etc. and natural objects and phenomena, such as the human eye or water."
        height="333"
        src="https://tvfcommunity-dev-ed--c.documentforce.com/servlet/rtaImage?eid=ka02v000001BOfL&amp;feoid=00N2v00000Rjh9i&amp;refid=0EM2v000002ijZG"
        width="500"
    />
    <br />
    Images may be two-<a href="https://en.wikipedia.org/wiki/Dimensional" target="_blank" title="Dimensional">dimensional</a>, such as a&nbsp;
    <a href="https://en.wikipedia.org/wiki/Photograph" target="_blank" title="Photograph">photograph</a>&nbsp;or screen display, or three-dimensional, such as a&nbsp;
    <a href="https://en.wikipedia.org/wiki/Statue" target="_blank" title="Statue">statue</a>&nbsp;or&nbsp;<a href="https://en.wikipedia.org/wiki/Hologram" target="_blank" title="Hologram">hologram</a>.
</body>

我想在 Python 3.x 中将所有出现的 img src 域从 tvfcommunity-dev-ed--c.documentforce.com 更改为 globalcommunity.networks.com

注意:寻找仅当域存在于 img src 中时才替换域的解决方案。如果在常规字符串或 iframe src 中,则不应替换。

有什么帮助吗?

【问题讨论】:

  • 使用例如解析 XML lxml,找到您要更改的标签并将其设置为您需要的任何值。

标签: html python-3.x replace


【解决方案1】:

您可以按照此处所述解决您的情况:

How to use string.replace() in python 3.x

string.replace(oldvalue, newvalue)

您可以使用简单的string.replace 来解决您的问题。

在你的情况下:

yourHtmlContainer = """<body><img alt="Images may be two-dimensional, such as a photograph or screen display, or three-dimensional, such as a statue or hologram. They may be captured by optical devices – such as cameras, mirrors, lenses, telescopes, microscopes, etc. and natural objects and phenomena, such as the human eye or water." height="333" src="https://tvfcommunity-dev-ed--c.documentforce.com/servlet/rtaImage?eid=ka02v000001BOfL&amp;feoid=00N2v00000Rjh9i&amp;refid=0EM2v000002ijZG" width="500"><br>Images may be two-<a href="https://en.wikipedia.org/wiki/Dimensional" target="_blank" title="Dimensional">dimensional</a>, such as a&nbsp;<a href="https://en.wikipedia.org/wiki/Photograph" target="_blank" title="Photograph">photograph</a>&nbsp;or screen display, or three-dimensional, such as a&nbsp;<a href="https://en.wikipedia.org/wiki/Statue" target="_blank" title="Statue">statue</a>&nbsp;or&nbsp;<a href="https://en.wikipedia.org/wiki/Hologram" target="_blank" title="Hologram">hologram</a>. </body>"""
print("Before replace")
print(yourHtmlContainer)


newHtml = yourHtmlContainer.replace("tvfcommunity-dev-ed--c.documentforce.com", "globalcommunity.networks.com")
print("After replace")
print(newHtml)

输出:

Before replace
<body><img alt="Images may be two-dimensional, such as a photograph or screen display, or three-dimensional, such as a statue or hologram. They may be captured by optical devices – such as cameras, mirrors, lenses, telescopes, microscopes, etc. and natural objects and phenomena, such as the human eye or water." height="333" src="https://tvfcommunity-dev-ed--c.documentforce.com/servlet/rtaImage?eid=ka02v000001BOfL&amp;feoid=00N2v00000Rjh9i&amp;refid=0EM2v000002ijZG" width="500"><br>Images may be two-<a href="https://en.wikipedia.org/wiki/Dimensional" target="_blank" title="Dimensional">dimensional</a>, such as a&nbsp;<a href="https://en.wikipedia.org/wiki/Photograph" target="_blank" title="Photograph">photograph</a>&nbsp;or screen display, or three-dimensional, such as a&nbsp;<a href="https://en.wikipedia.org/wiki/Statue" target="_blank" title="Statue">statue</a>&nbsp;or&nbsp;<a href="https://en.wikipedia.org/wiki/Hologram" target="_blank" title="Hologram">hologram</a>. </body>
After replace
<body><img alt="Images may be two-dimensional, such as a photograph or screen display, or three-dimensional, such as a statue or hologram. They may be captured by optical devices – such as cameras, mirrors, lenses, telescopes, microscopes, etc. and natural objects and phenomena, such as the human eye or water." height="333" src="https://globalcommunity.networks.com/servlet/rtaImage?eid=ka02v000001BOfL&amp;feoid=00N2v00000Rjh9i&amp;refid=0EM2v000002ijZG" width="500"><br>Images may be two-<a href="https://en.wikipedia.org/wiki/Dimensional" target="_blank" title="Dimensional">dimensional</a>, such as a&nbsp;<a href="https://en.wikipedia.org/wiki/Photograph" target="_blank" title="Photograph">photograph</a>&nbsp;or screen display, or three-dimensional, such as a&nbsp;<a href="https://en.wikipedia.org/wiki/Statue" target="_blank" title="Statue">statue</a>&nbsp;or&nbsp;<a href="https://en.wikipedia.org/wiki/Hologram" target="_blank" title="Hologram">hologram</a>. </body>

更多帮助: https://www.w3schools.com/python/ref_string_replace.asp

【讨论】:

  • 嗨 Mateus,感谢您的回复。您的答案甚至替换了包含 tvfcommunity-dev-ed--c.documentforce.com 的常规字符串(或 iframe src)。我正在寻找一种解决方案,仅当它出现在 img src 中时才进行替换。
  • 你能迭代你的HTML标签吗?如果答案是肯定的,您可以进行一些检查以查看更改的标签是否为 img。如果可以解决您的情况,我可以更改代码。 python-looping-through-html
  • 嗨,马特乌斯,可能可行。如果我可以检查和替换,而不是检查后记,那么它可能会有所帮助。
【解决方案2】:

感谢大家的宝贵意见。

我已经使用 BeautifulSoup 解决了这个问题

from bs4 import BeautifulSoup

html_doc = '<body><img alt="Images may be two-dimensional, such as a photograph or screen display, or three-dimensional, such as a statue or hologram. They may be captured by optical devices – such as cameras, mirrors, lenses, telescopes, microscopes, etc. and natural objects and phenomena, such as the human eye or water." height="333" src="https://tvfcommunity-dev-ed--c.documentforce.com/servlet/rtaImage?eid=ka02v000001BOfL&amp;feoid=00N2v00000Rjh9i&amp;refid=0EM2v000002ijZG" width="500"/><br /> Images may be two-<a href="https://en.wikipedia.org/wiki/Dimensional" target="_blank" title="Dimensional">dimensional</a>, such as a&nbsp; <a href="https://en.wikipedia.org/wiki/Photograph" target="_blank" title="Photograph">photograph</a>&nbsp;or screen display, or three-dimensional, such as a&nbsp; <a href="https://en.wikipedia.org/wiki/Statue" target="_blank" title="Statue">statue</a>&nbsp;or&nbsp;<a href="https://en.wikipedia.org/wiki/Hologram" target="_blank" title="Hologram">hologram</a>. </body>'
modified_data = BeautifulSoup(html_doc, 'html.parser')

# Find image and change src domain
for tag in modified_data.findAll("img"): 
  tag['src'] = tag['src'].replace('https://tvfcommunity-dev-ed--c.documentforce.com/', 'https://globalcommunity.networks.com/')
print(modified_data)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2021-06-16
    • 2011-06-09
    • 1970-01-01
    • 2012-12-18
    • 2013-10-16
    • 2014-06-14
    • 2013-09-12
    • 2016-12-16
    相关资源
    最近更新 更多