Python 2.7：连接、修剪和搜索文本文件答案

【问题标题】：Python 2.7: Concatenate, Trim, and Search text filesPython 2.7：连接、修剪和搜索文本文件
【发布时间】：2015-07-27 17:14:00
【问题描述】：

我有多个包含文本数据流的文本文件。有些标题会根据计数分解数据。问题是我感兴趣的数据块的标题可能在另一个文件中。好像是这样的……

file1.txt

=======Boot Count 1============
(random text strings)
...
...
...
=======Boot Count 2============
...

file2.txt

...
...
...
=======Boot Count 3============
...
...
=======Boot Count 4============
...

file3.txt

...
...

我需要找到一些位于最新启动计数中的信息。所以我需要

将文本文件连接在一起
向后搜索直到看到引导计数标头
去掉所有多余的东西
然后只在最后一部分搜索特定字符串。

我可以处理#4。关于 1-3 有什么想法吗？

【问题讨论】：

所以基本上你想要最高启动计数部分？
正确。之后我可以使用正则表达式来查找我要查找的内容，但首先我需要正确的部分来搜索
为什么不逐个搜索每个文件以查找启动计数，然后只保存最新的？这将消除 (1) 并使 (2) 更容易。
因为不能保证文件中包含启动计数。标题可以在一个文件中，内容可以滚动到另一个文件中
@njfrazie，你为什么需要连接文件，当然你想要的只是最新的启动计数并从那里向下搜索？

标签： python

【解决方案1】：

只需检查每个文件并找到具有最新计数的文件：

from itertools import islice
with open("file1.txt") as f1, open("file2.txt") as f2, open("file3.txt") as f3:
    best_count,index,f_obj = 0,0,None
    import re
    r = re.compile("Boot\s+Count\s+(\d+)")      
    for obj in (f1, f2, f3):
        for ind, line in enumerate(obj,1):
            match = r.search(line)
            if match:
                i = int(match.group())
                if i > best_count:
                    best_count = i
                    index = ind
                    f_obj = obj
    f_obj.seek(0)
    for line in islice(f_obj, index):# search for the string
        print(line)

best_count,index 和 f_obj 将跟踪最新计数所在的行以及它所在的文件，然后您可以回到开头并使用 itertools.islice 获取您想要从具有最新计数的文件中获取的部分。

如果计数的唯一行总是以= 开头，您也可以使用if line[0] == "=" 来加快搜索速度。

【讨论】：

我喜欢这个，但是标题和数据不能保证在一起。因此，如果最高启动计数在 file2 中，那么我要查找的实际数据可能在 file3 中。
那么你仍然可以使用相同的逻辑，一旦你找到文件，如果你没有在该部分中达到另一个引导计数，然后开始读取下一个文件，直到你这样做，这将是全部部分，如果您确实达到了启动计数，它应该全部在一个文件中。如果您愿意，可以将所有文件作为一个长文件读取
如果我有未知数量的文件，我将如何重做 with 语句？我一直在查看 fileinput 模块来执行此操作，但它缺少一些东西
for file in iterable_of_filename:with open(file) as f:...

【解决方案2】：

我找到了一种方法来完成我所需要的。类似于帕德莱克的做法。

def issue(path):
    #path is full path with a wild card character:
    #example: "C:\users\joeShmoe\file*"         

    count = 0
    linenumber = 0
    fileList = []
    fileindex = 1
    bootFound = False

    for name in sorted(glob.glob(path)):
        fileList.append(name)

    for file in fileList:
        if bootFound == True:
            break

        fileindex += 1

        for line in reversed(open(file,'rb').readlines()):
            content = line.rstrip()
            b = re.compile(ur'(BOOT COUNT =)')
            bootCount = re.search(b,content)
            linenumber += 1
            if (bootCount is not None) :
                bootFound = True
                break

    if bootFound == False:
        return None

    filesearch = sorted(fileList[:fileindex],reverse=True)
    lines = [line.strip() for line in fileinput.input(files=filesearch)]

    startpt = len(lines) - linenumber

    if len(lines)  <= 0:
        return None

    if startpt <= 0:
        startpt = 0

    for line in islice(lines,startpt,len(lines)):
            content = line.rstrip()
            p = re.compile(ur'FAILURE HERE')
            failure = re.search(p,content)
            if (failure is not None):
                return 1

    return None

【讨论】：