这是我的简单方法,soup.body.clear() 或soup.tag.clear()
假设您要清除<table></table> 中的内容并添加一个新的pandas 数据框;稍后您可以使用这种清晰的方法轻松地更新您网页的 html 文件中的表格,而不是烧瓶/django:
import pandas as pd
import bs4
我想将 120 万行 .csv 转换为 DataFrame,然后转换为 HTML 表格,
然后将其添加到我网页的 html 语法中。后来我想轻松
只要通过简单地切换变量来更新 csv 更新数据
bizcsv = read_csv("business.csv")
dframe = pd.DataFrame(bizcsv)
dfhtml = dframe.to_html #convert DataFrame to table, HTML format
dfhtml_update = dfhtml_html.strip('<table border="1" class="dataframe">, </table>')
"""use dfhtml_update later to update your table without the <table> tags,
the <table> is easy for BS to select & clear!"""
#A small function to unescape (< to <) the tags back into HTML format
def unescape(s):
s = s.replace("<", "<")
s = s.replace(">", ">")
# this has to be last:
s = s.replace("&", "&")
return s
with open("page.html") as page: #return to here when updating
txt = page.read()
soup = bs4.BeautifulSoup(txt, features="lxml")
soup.body.append(dfhtml) #adds table to <body>
with open("page.html", "w") as outf:
outf.write(unescape(str(soup))) #writes to page.html
"""lets say you want to make seamless table updates to your
webpage instead of using flask or django x_x; return to with open function"""
soup.table.clear() #clears everything in <table></table>
soup.table.append(dfhtml_update)
with open("page.html", "w") as outf:
outf.write(unescape(str(soup)))
我是新手,但经过大量搜索后,我只是结合了文档中的一堆基本教义......有点臃肿,但处理数十亿个数据单元格也是如此。这对我有用