Python CSV到HTML表格代码不起作用答案

【问题标题】：Python CSV to HTML table code not workingPython CSV到HTML表格代码不起作用
【发布时间】：2015-06-16 08:04:11
【问题描述】：

我的代码目前看起来像这样。 xls 到 csv 部分的转换有效，但不能写入 HTML。

import xlrd
import csv
import sys

# write from xls file to csv file
wb = xlrd.open_workbook('your_workbook.xls')
sh = wb.sheet_by_name('Sheet1')
your_csv_file = open('your_csv_file.csv', 'wb')
wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)

for rownum in xrange(sh.nrows):
    wr.writerow(sh.row_values(rownum))

your_csv_file.close()
print "Converted from xls to csv!"
# write from csv file to html 

# if len(sys.argv) < 3:
#   print "Usage: csvToTable.py csv_file html_file"
#   exit(1)

# Open the CSV file for reading
reader = csv.reader(open("your_csv_file.csv"))

# Create the HTML file for output
htmlfile = open("data.html","w+")

# initialize rownum variable
rownum = 0

# generate table contents
for row in reader: # Read a single row from the CSV file
    for line in htmlfile:
        # this HTML comment is found in the HTML file where I want to insert the table
        if line == "<!-- Table starts here !-->":
            # write <table> tag
            htmlfile.write('<table>')
            htmlfile.write('<tr>') # write <tr> tag
            for column in row:
                htmlfile.write('<th>' + column + '</th>')
            htmlfile.write('</tr>')
            # write </table> tag
            htmlfile.write('</table>')

        #increment row count    
        rownum += 1



# print results to shell
print "Created " + str(rownum) + " row table."
exit(0)

输出只是一个空白页，因为程序找不到

<!-- Table starts here !-->

【问题讨论】：

你似乎从头开始写了很多东西，所有这些功能已经存在于一个名为 pandas 的很棒的库中：pandas.pydata.org/pandas-docs/dev/generated/…

标签： python html csv

【解决方案1】：

正如Delimitry所说，你的阅读模式不对：

w+ : 打开一个文件进行读写。覆盖如果文件存在，则存在文件。如果文件不存在，则创建用于读写的新文件。

所以它做的第一件事就是截断（清空）整个文件。

【讨论】：

【解决方案2】：

尝试将阅读模式从“w+”改为“a+”：

htmlfile = open("data.html", "a+")

当您以w+ 模式打开文件data.html 时，它会被截断，然后当您阅读for line in htmlfile: 行时，您将找不到"" HTML 注释。

还可以添加line.strip() 来读取字符串末尾不带换行符的行：

if line.strip() == "<!-- Table starts here !-->":

我建议您将 HTML 文件 read 和 write 分开。例如，您可以将代码更改为：

out_lines = []
with open('data.html', 'r') as htmlfile:
    # read lines once, and scan for HTML comment for each row
    lines = htmlfile.readlines()
    # generate table contents
    for row in reader: # Read a single row from the CSV file
        for line in lines:
            # this HTML comment is found in the HTML file where I want to insert the table
            if line.strip() == "<!-- Table starts here !-->":
                # write <table> tag
                out_lines.append('<table>')
                out_lines.append('<tr>') # write <tr> tag
                for column in row:
                    out_lines.append('<th>' + column + '</th>')
                out_lines.append('</tr>')
                # write </table> tag
                out_lines.append('</table>')
            # increment row count    
            rownum += 1

# update your html file
with open('data.html', 'a') as f:
    f.write('\n'.join(out_lines))

【讨论】：

我这样做了，它根本没有改变 HTML 文件。我猜这是对空白页的改进。
检查“”是否存在于您的 HTML 文件中。
是的。我认为 for 循环存在某种问题，我正在再次查看它。
我按照 Tichodroma 的建议添加了它，但仍然没有骰子。

【解决方案3】：

这里有两三个问题。我将一一介绍它们，但首先我想说我将使用Pandas library 执行此任务。它所做的远不止这种任务，但如果您确实安装了它，那么将数据转换为表格格式所需要做的就是：

import pandas as pd
xls = pd.ExcelFile('path_to_file.xls')
df = xls.parse('Sheet1') # parse the sheet you're interested in - results in a Dataframe
table_html = df.to_html()

您现在拥有一个包含 html <table> 格式的整个数据的字符串 (table_html)，您可以将其直接写入您的 html 文件。没有中间 csv 阶段或任何东西。该文档适用于 pandas.ExcelFile.parse 和 pandas.DataFrame.to_html()

现有解决方案的问题

1。字符串比较

您正在寻找用您的 html 替换的注释行 - 您正在使用 == 比较两个字符串。除非您完全确定字符串完全相同——没有额外的空格、没有行尾、没有额外的标点符号等——否则这通常容易出错。

您可以使用strip() 删除空格，然后按照其他人的建议使用==。就我个人而言，我很想更加宽容并使用 in 关键字，例如：

if '<!-- Table starts here' in line:

那么，后者 ! 是在字符串中，还是在文本之前或之后的空格等都无关紧要。您可能会更加宽容并使用正则表达式，这样您就可以在注释标记和文本。您可能会知道字符串在您正在使用的 .html 文件中的精确度。

2。并发读写`.html`文件

您正试图在文件中间插入文本。有一个Q&A covering methods how to that。简而言之，在您的情况下（相对较小的数据，即一个 .html 文件），我会将所有行读入一个列表，然后在您想要的位置插入表格 HTML，例如

content = []
insert_index = None
with open('data.html', 'r') as htmlfile:
    for line in htmlfile:
        content.append(line)
        if '<!-- Table starts here' in line:
            insert_index = len(content)

if insert_index:
    content.insert(insert_index, table_html)

注意我假设您在开始时使用 Pandas 方法获得了 table_html。如果您出于某种原因不想这样做，但仍想通过csv 获取内容，您始终可以通过创建一个空字符串然后以类似的方式添加所有 HTML 元素来构建table_html你的循环现在是怎么做的。

3。编写html

其他人注意到您可以使用文件打开的附加模式，而不是写入模式。这很好，但是如果您使用上面的方法将所有内容读入列表并插入到列表中，那么您可以简单地这样做：

with open('data.html', 'w+') as f:
    f.write('\n'.join(content))

【讨论】：

非常感谢！像这样的答案使这个网站如此棒。不过有一件事，假设行和列的格式相同，我将如何使新电子表格的值覆盖 HTML 中的旧表？
Ooof - 这是一个不同的（而且更难）问题。我这样做的方法是在末尾添加一个额外的注释标记（即在插入 table_html 之后），然后调整您的代码，以便当它检测到表格开始标记时，它不会添加到 content 直到它检测表格结束标记。其他一切都将保持不变。试一试——如果你遇到更多问题，是时候提出一个新的 StackOverflow 问题了 :) 当然，你可以在这里链接到它，但这个问题太大了，无法在 cmets 中完全解释/编辑到现有问题中。跨度>

【解决方案4】：

您从htmlfile 读取的行包含一个尾随换行符。在比较之前你必须strip它：

if line.strip() == "<!-- Table starts here !-->":

提示：

HTML cmets 的开头只有 !，结尾没有。不禁止写

<!-- Table starts here !-->
-----------------------^

但是第二个! 非常少见。

【讨论】：

现有解决方案的问题

1。字符串比较

2。并发读写.html文件

3。编写html

2。并发读写`.html`文件