Python-在特定行读取标题答案

【问题标题】：Python- Read header at specific linePython-在特定行读取标题
【发布时间】：2020-12-22 21:10:55
【问题描述】：

我正在尝试读取具有多个标题的文本文件 - 但标题从第 1000 行开始。例如，我的标题如下所示：

LN 类型 Pct 金额本金
TP 代码到期

所以，如您所见，我的标题是自动换行的，从文本文件的第 1000 行开始。如何将其导入 Python？这样就可以识别我的标题和列？

到目前为止我的代码：

topheader='Acct Total'
with open('1000.txt') as f:
    for num, line in enumerate(f,1):
        if topheader in in line:
            df = pd.read_csv('1000.txt',header=[num,next()] #I knw this is incorrect, but I need help

每次“Acct Total”在文件中（第 999 行）时，标题都在下一行（第 1000 行）。如何让 Python 读取第 1000 行的标题，并识别标题是自动换行的？

【问题讨论】：

仅供参考 - 有 4 列有 4 个标题，它们是用两行换行的。例如，LN \n TP 是一列。
df = pd.read_csv('1000.txt', skip_rows=999, header=[0,1])?

标签： python pandas header word-wrap

【解决方案1】：

以下内容可能对您有用。 StringIO 只是让一个字符串表现得像一个文件。这只是为了让这段代码 sn-p 可运行。

from io import StringIO  # just for example

text = """#
#
#
#
#
#
#
#
LN Type Pct Amount Principal
TP Code Due Owed
1 2 3 4
5 6 7 8
9 1 2 3
4 5 6 7
8 9 1 0"""

f = StringIO(text)

while True:
    line = f.readline()
    line = line.strip()
    if line.startswith("LN"):
        break  #find where the columns start
line2 = f.readline()  # get the next row
# construct column names
names = [f"{a}_{b}" for a,b in zip(line.split(), line2.split())]

# file is now at the start of the data, so pandas will start reading from there
# pass in the column names explicitly
# read_table and read_csv have similar call signatures
df=pd.read_table(f, header=None, sep=" ", names=names)
print(df)

输出：

   LN_TP  Type_Code  Pct_Due  Amount_Owed
0      1          2        3            4
1      5          6        7            8
2      9          1        2            3
3      4          5        6            7
4      8          9        1            0

【讨论】：

这很接近 - 但是当我运行文件时，我不断收到错误消息：ValueError: Duplicate names are not allowed.
打印出名称变量以查看重复的名称。