根据索引位置解析文本文件提取值答案

【问题标题】：Parse a Text File extracting values according to its index position根据索引位置解析文本文件提取值
【发布时间】：2021-07-05 08:43:34
【问题描述】：

大家好，你们好吗？我希望你没事！如何解析使用索引位置提取特定值的文本文件，将值附加到列表，然后将其转换为 pandas 数据框。到目前为止，我能够编写以下代码：文本示例：

标题：0RCPF049100000084220210407
正文：1927907801100032G 00成功
1067697546140032G 00成功
1053756666000032G 00成功
1321723368900032G 00成功
1037673956810032G 00成功

例如，第一行是标题，从中，我只需要位于以下索引位置的日期： date_from_header = linhas[0][18:26] 其余的值在正文中

import csv
import pandas as pd

headers = ["data_mov", "chave_detalhe", "cpf_cliente", "cd_clube",
           "cd_operacao","filler","cd_retorno","tc_recusa"]

# This is the actual code
with open('RCPF0491.20210407.1609.txt', "r")as f:
  linhas = [linha.rstrip() for linha in f.readlines()]
  for i in range(0,len(linhas)):
     data_mov = linhas[0][18:26]
     chave_detalhe=linhas[1][0:1]
     cpf_cliente=linhas[1][1:12]
     cd_clube=linhas[1][12:16]
     cd_operacao=linhas[1][16:17]
     filler=linhas[1][17:40]
     cd_retorno=linhas[1][40:42]
     tx_recusa=linhas[1][42:100]
data = [data_mov,chave_detalhe,cpf_cliente,cd_clube,cd_operacao","filler,cd_retorno,tc_recusa]

预期的结果如下所示：

data_mov chave_detalhe cpf_cliente cd_clube cd_operacao filler cd_retorno tx_recusa
'20210407' '1'         92790780110 '0032'   'G'        'blank space' '00'   'sucesso'
'20210407' '1'         92790780110 '0032'   'G'        'blank space' '00'   'sucesso'
'20210407' '1'         92790780110 '0032'   'G'        'blank space' '00'   'sucesso'

【问题讨论】：

这个问题有点难以理解。你能：发布一个 filename.txt 的例子吗？
但是已经在查看您的代码：您的 for loops 一遍又一遍地重复相同的事情（从 filename.txt 中读取第 0 行和第 1 行）（因为您不使用迭代器变量， i 在循环内）
但我希望您的数据可能是 csv 或类似的，并且 pandas 具有读取该数据的功能：read_csv。见：datacamp.com/community/tutorials/pandas-read-csv
@SamBob 谢谢我正在试图弄清楚如何循环文件并根据索引位置提取所有值
啊，所以你试图从第一行提取 data_mov，然后是“chave_detalhe”、“cpf_cliente”、“cd_clube”、“cd_operacao”、“filler”、“cd_retorno”、“tc_recusa” “来自其他每一行？暂时忽略第一行，stackoverflow.com/a/10851479/1581658 是否有助于拆分行？

标签： python parsing etl

【解决方案1】：

使用stackoverflow.com/a/10851479/1581658

def parse_file(filename):
    indices = [0,1,12,16,17,18,20] # list the indices to split on
    parsed_data = [] # returned array by line
    with open(filename) as f:
        header = next(f) #skip the header
        data_mov = header[18:26] # and get data_mov from header
        for line in f: #loop through lines
            #split each line by the indices
            parts = [data_mov] + [line.rstrip()[i:j] for i,j in zip(indices, indices[1:]+[None])]
            parsed_data.append(parts)
    return parsed_data

print(parse_file("filename.txt"))

【讨论】：

感谢您的宝贵时间，我做了一些调整，现在可以正常使用了！最好的问候

【解决方案2】：

感谢 SamBob 的帮助，如果有人需要，请遵循最终解决方案：

import itertools
import pandas as pd

pd.options.display.width = 0

def parse_file(filename):
    indices=[0,1,12,16,17,18,42]  # list of indexes
    parsed_data = [] # return a list
    with open(filename) as f:
        header = next(f) 
        data_mov = header[18:26]
        for line in itertools.islice(f,1,100): 
            # dividr de acordo com os índices.
            parts = [data_mov] + [line.rstrip()[i:j] for i,j in zip(indices, indices[1:]+[None])]
            parsed_data.append(parts)
            
            # convert to dataframe
            cols = ['data_mov', 'chave_detalhe', 'cpf_cliente','cd_clube','cd_operacao','filler','cd_retorno','tx_recusa']
            df = pd.DataFrame(parsed_data, columns=cols)

    return df


df = (parse_file("filename.txt"))

【讨论】：