python：更改HTML文件中的数据超链接答案

【问题标题】：python: change data hyperlink in HTML filepython：更改HTML文件中的数据超链接
【发布时间】：2018-08-01 15:50:29
【问题描述】：

我们网站上有一个指向 zip 文件夹的链接。它的 HTML 文件中的行显示如下： <p><a href="Data/WillCounty_AddressPoint.zip">Address Points</a> (updated weekly)</p>

zip 文件夹的名称很快将使用当前日期更改，如下所示： WillCounty_AddressPoint_02212018.zip

如何更改 HTML 中的对应行？

使用this 回答我有一个脚本。它运行时没有错误，但不会更改 HTML 文件中的任何内容。

import bs4
from bs4 import BeautifulSoup
import re
import time

data = r'\\gisfile\GISstaff\Jared\data.html' #html file location
current_time = time.strftime("_%m%d%Y") #date

#load the file
with open(data) as inf:
    txt = inf.read()
    soup = bs4.BeautifulSoup(txt)

#create new link
new_link = soup.new_tag('link', href="Data/WillCounty_AddressPoint_%m%d%Y.zip")
#insert it into the document
soup.head.append(new_link)

#save the file again
with open (data, "w") as outf:
    outf.write(str(soup))

【问题讨论】：

你想把什么日期放在那里？固定日期/代码运行的当前日期/其他？（你说它是“每周”更新的，但你也有current_time = time.strftime("_%m%d%Y") #date）。
我想把当前日期放在上面。由于 zip 文件夹中的文件将每周更新相应的日期，因此 HTML 中的 zip 名称也必须更改。
但这意味着time.strftime("_%m%d%Y") 在一周的 7 天中有 6 天给您一个无效的文件名？
@roganjosh 我不确定为什么会这样。它可以正常工作。每次运行时都会给出当前日期。

标签： python html hyperlink beautifulsoup

【解决方案1】：

这就是你可以使用 BeautifulSoup 替换 href 属性的方法。

from bs4 import BeautifulSoup
import time
data = r'data.html' #html file location
#load the file
current_time = time.strftime("_%m%d%Y")
with open(data) as inf:
     txt = inf.read()
soup = BeautifulSoup(txt, 'html.parser')
a = soup.find('a')
a['href'] = ("WillCounty_AddressPoint%s.zip" % current_time)
print (soup)

#save the file again
with open (data, "w") as outf:
    outf.write(str(soup))

输出：

<p><a href="WillCounty_AddressPoint_02212018.zip">Address Points</a> (updated weekly)</p>

并写入文件

已更新以使用提供的文件中的数据。

from bs4 import BeautifulSoup
import time
data = r'data.html' #html file location
#load the file
current_time = time.strftime("_%m%d%Y")
with open(data) as inf:
     txt = inf.read()
soup = BeautifulSoup(txt, 'html.parser')
# Find the a element you want to change by finding it's text and selecting parent.
a = soup.find(text="Address Points").parent
a['href'] = ("WillCounty_AddressPoint%s.zip" % current_time)
print (soup)
#save the file again
with open (data, "w") as outf:
    outf.write(str(soup))

然而，它会删除空行，否则会使您的 HTML 代码保持原样。

使用差异工具查看原始文件和修改文件的差异：

diff data\ \(copy\).html data.html 
77c77
< <p><a href="Data/WillCounty_AddressPoint.zip">Address Points</a> (updated weekly)</p>
---
> <p><a href="WillCounty_AddressPoint_02222018.zip">Address Points</a> (updated weekly)</p>
116,120d115
< 
< 
< 
< 
< 
154d148
<

【讨论】：

谢谢。它运行没有错误，但输出错误。它没有改变这一行：<p><a href="WillCounty_AddressPoint_02212018.zip">Address Points</a> (updated weekly)</p>（第 87 行）它所做的是添加了相同 href 属性的附加行（到第 25 行）。它还稍微改变了整个 HTML。理想情况下，我需要这个脚本来简单地更改现有行。
你能发布一个链接到你的整个 html 文件吗？
编辑：它在一个 zip 中，因此您可以使用适当的查看器打开。 willcogis.org/website2014/gis/Data/data.zip
更新答案以响应发布的 HTML 文件。