【问题标题】：Find all files in a directory and process only ones that have a specific line, skip if does not查找目录中的所有文件并仅处理具有特定行的文件，如果没有则跳过
【发布时间】：2020-04-14 18:10:48
【问题描述】：

需要帮助，我是 python 新手，我写了一个脚本，它应该在一个目录中找到所有文件，只处理那些有特定行的文件，跳过那些没有该行的文件。具体的行是‘, Run Time’ 它无法仅处理我需要的文件，它会处理所有文件。

要查找的所有行：

',"运行时间'
‘,”开始时间’
‘,”结束时间’
‘Test_ID e:’
‘测试程序名称：’
“产品：”

第 1、2 和 3 行是重复行，我都需要它们，第 4、5 和 6 行也在重复，但我只需要捕获它们。

导入操作系统

runtime_l = ',"  Run  Time'
start_tm  = ',"  Start Time'
end_tm    = ',"  End  Time'
test_ID   = ' Host Name: '
program_n = 'Test Program Name:'
prod_n    = 'Product:'

given_path = 'C:\\02\\en15\\TST'
for filename in os.listdir(given_path):
    filepath = os.path.join(given_path, filename)
    if os.path.isfile(filepath):
        print("File Name:   ", filename) 
        print("File Name\\Path:", filepath) 
        with open(filepath) as mfile:        
            for line in mfile:
                if runtime_l in line:
                    # do something with the line
                    print(line)

                if start_tm in line:
                    # do something with the line
                    print(line)  

                if end_tm in line:
                    # do something with the line
                    print(line) 

                if test_ID in line:
                    # do something with the line
                    print (line)

                if program_n in line:
                    # do something with the line
                    print (line)

                if prod_n in line:
                    # do something with the line
                    print (line)
                else:

继续

如果一个文件有“运行时间”行，我将如何测试它。不确定它是否是“pythony”外观的脚本，但它可以满足我的需要。它会找到包含我想要的行的文件并处理它们。

import os

runtime_l = ',"  Run  Time'
start_tm  = ',"  Start Time'
end_tm    = ',"  End  Time'

given_path = 'C:\\02\\en15\\TST'
for filename in os.listdir(given_path):
    filepath = os.path.join(given_path, filename)
    if os.path.isfile(filepath):
        #print("File Name:   ", filename) 
        #print("File Name\\Path:", filepath) 
        with open(filepath) as mfile:        
            for line in mfile:
                if runtime_l in line:
                    #runtime_file = open(filepath, 'r')
                    with open(filepath) as runtime_file:
                        for rn_l in runtime_file:
                            if runtime_l in rn_l: 
                                print (rn_l)
                            elif start_tm  in rn_l:
                                print (rn_l)
                            elif end_tm  in rn_l:
                                print (rn_l)

【问题讨论】：

托德，谢谢你的回答，但这不是我想要的，或者我不明白你的回答。首先，我只需要处理具有“运行时间”行的文件。其次，我只需要捕获第 4,5 和 6 行的第一个匹配项，请参阅原始请求。感谢您的帮助！
托德！非常感谢您的帮助，但我是 Python 新手，您的提示可能很棒，但我无法使用它们 - 我不理解它们。
如何测试目录中的文件以查找文件是否有行（运行时），如果有，我需要处理它并忽略那些没有行的文件。我需要使用（运行时）字符串处理所有文件。需要找到 6 个特定的行（见原帖）。对于 ines 4、5、6，我只想打印（该行的）第一个匹配项。如果脚本找到匹配第 4、5、6 行的 10 或 15 行，我只需要打印每行的 1 个匹配项。
您将不得不打开每个文件并读取一些文本以确定是否应该处理文件的其余部分。没有办法解决这个问题。唯一的问题是您希望在文件中的哪个位置找到该文本 - 您可能只读取文件的前 n 行或最后 n 行（使用seek()）来检查它，然后处理整个文件如果找到。

标签： python-3.x search

【解决方案1】：

使用带有any() 的生成器来确定某个字符串是否在文件中：

if any(tgt_str in line for line in my_file):

    print("string found")

else:
    print("not found")

any() 托管一个生成器，并将在具有目标字符串的第一行停止并返回True。如果该字符串不在文件中，它将被扫描到最后，any() 将返回False。 请参阅下面的生成器和列表推导的简要介绍。

any(line for line in my_file if tgt_str in line)

也有效。 next() 也可以使用，但如果在文件中找不到字符串，它会引发 StopIteration。

下面保留了您的大部分代码并添加了一些内容。

runtime_l = ',"  Run  Time'
start_tm  = ',"  Start Time'
end_tm    = ',"  End  Time'
test_ID   = ' Host Name: '
program_n = 'Test Program Name:'
prod_n    = 'Product:'

class OneTime:
    def __init__(self):
        self.test_ID    = 0
        self.program_n  = 0
        self.prod_n     = 0

    def all_done(self):
        return self.test_ID & self.program_n & self.prod_n


given_path = './test_logs'

for filename in os.listdir(given_path):
    filepath = os.path.join(given_path, filename)

    if not os.path.isfile(filepath):
        continue

    with open(filepath, 'r') as mfile:

        if not any(runtime_l in ln for ln in mfile):
            continue # Onto the next file...

        # If we get this far, we know the runtime_l string is in
        # the file. Proceed...

        mfile.seek(0) # Back to start of file.

        one_time      = OneTime()
        one_time_done = 0

        print("File Name:      ", filename) 
        print("File Name\\Path:", filepath) 

        for line in mfile:

            line = line.rstrip()

            if runtime_l in line:
                # do something with the line
                print(line)

            elif start_tm in line:
                # do something with the line
                print(line)  

            elif end_tm in line:
                # do something with the line
                print(line) 

            elif not one_time_done:

                if not one_time.test_ID and test_ID in line:
                    # do something with the line
                    one_time.test_ID = 1
                    print(line)

                elif not one_time.program_n and program_n in line:
                    # do something with the line
                    one_time.program_n = 1
                    print(line)

                elif not one_time.prod_n and prod_n in line:
                    # do something with the line
                    one_time.prod_n = 1
                    print(line)

                one_time_done = one_time.all_done()

OneTime 类的作用只是托管变量，指示一次性打印的行之一是否已经打印。您可以使用 dict，或者只是简单的变量来跟踪它。

any(...) 函数内的语句创建了一个 generator，它在迭代时生成项目。它就像一个列表，您可以从一个一个中获取值，除了它是惰性的。这些项目不存储在其中，而是在any() 要求每一行时根据需要读取，直到它遇到评估为True 的行。

我知道，当您刚接触 Python 并尝试学习时，它似乎是高级的东西。如果您想了解更多信息，我将给出一个简短的解释并在下面提供一个链接。

简而言之，我不确定您是否已经熟悉“列表推导式”，但我相信您已经看过它们。它们看起来像方括号内的 for 循环并生成列表项：

>>> li = [char for char in "abcdefghijklmnop"]
>>> li
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p']

这个 ^ 将创建一个字符列表，因为当您遍历一个字符串时，它一次生成一个字符。

生成器看起来像括号内的列表推导式（而不是方括号），并且具有相同的效果，但在被要求之前不会生成字符：

>>> gen  = (char for char in "abcdefghijklmnop")
>>> gen
<generator object <genexpr> at 0x7f9a447aec10>
>>> next(gen)
'a'

所以，回到any()。它将从生成器中提取项目，直到它看到评估为True 的内容，然后它会停止。它只在一系列项目中寻找任何为 True 的值。

>>> any([False, False, False, True])
True
>>>
>>> any("foo" in line for line in ["bar", "baz", "foo", "qux"])
True

当迭代达到"foo" 时，表达式"foo" in line 为True。迭代在此停止。 any() 的参数列表中的整个语句是一个生成器 - 有时它们在用作参数时不需要自己的括号。

与上一个示例大致相同，文件生成器将一次读取一行，直到满足any() - 并且不会再继续。所以这样的话还是有点效率的。

如果您有兴趣，这里有一个基本教程：https://www.csestack.org/python-fibonacci-generator/。

【讨论】：

谢谢，托德！在找到第一个匹配项后，我可以在 Python 中重新加载文件吗？并开始寻找另一个可行的匹配？我问这个是因为你写的 sn-ps 没有做我想要的，我不能修改它们，因为我不理解它们。我是新手。
@VladePast，我删除了我的复杂示例并使用了您的代码。所以现在示例中的代码大部分是你的;-)。重新审视它，不要太关注check_file().. 但看看第二个sn-p是否适合你。
@VladePast.. 等一下.. 我会删除所有多余的，只放最小的。
托德！如果你住在俄勒冈州波特兰，我会在隔离结束后给你买杯饮料；）
当然，弗拉德。不过我不喝酒=/