Python中的匹配模式答案

【问题标题】：Matching pattern in PythonPython中的匹配模式
【发布时间】：2013-02-19 06:16:37
【问题描述】：

我有一个目录“/pcap_test”，其中包含多个日志文件。每个文件都有这样的模式：

Pkt：1（358 字节），LIFE：1，应用程序：itunes (INTO)，状态：TERMINATED，堆栈：/ETH/IP/UDP/itunes，错误：无

Pkt：2（69 字节），LIFE：2，应用程序：zynga (INTO)，状态：INSPECTING，堆栈：/ETH/IP/UDP，错误：无

Pkt：3（149 字节），LIFE：2，应用程序：pizzeria (INTO)，状态：TERMINATED，堆栈：/ETH/IP/UDP/pizzeria，错误：无

在这种情况下，我希望输出是第二行，因为“应用程序”中的内容不存在于“堆栈：”中

我编写了一个小的 Python 脚本来遍历目录，打开每个文件并打印输出：

import os
list = os.listdir("/home/test/Downloads/pcap_test")
print list
for infile in list:
  infile = os.path.join("/home/test/Downloads/pcap_test" , infile)

if os.path.isfile(infile):
str = file(infile, 'r').read()
print str

我以某种方式使用 grep 获得了输出，但无法在 python 脚本中使用相同的输出。它类似于：

grep -vP 'App: ([^, ]*) \(INTO\).*Stack: .*\1.*$' xyz.pcap.log | grep -P 'App: ([^, ]*) \(INTO\)'

由于我已经有了名为 "str" 的文件，我想使用它而不是单独的日志文件来获取输出。

我们将非常感谢您在这方面的任何帮助。

【问题讨论】：

标签： python regex grep pattern-matching subprocess

【解决方案1】：

首先，我建议不要使用 str 这样的变量名称，因为这是 Python 对 String 原始数据类型的名称。

由于 grep 是一个命令行正则表达式工具，而且您已经有了一个可以工作的正则表达式，所以您需要做的就是学习使用 Python 的 re module。

捕获 grep 的 -v 行为有点困难。我建议逐行读取文件并仅在它与您的第一个正则表达式不匹配但与第二个正则表达式匹配时打印该行，如下所示：

if os.path.isfile(infile):
    with file(infile, 'r') as logFile: #this will close the file pointer automatically when you finish
        for line in logFile: #read logFile one line at a time
            firstReMatch = re.match(r'App: ([^, ]*) \(INTO\).*Stack: .*\1.*$', line) #check if this line matches your first regex
            secondReMatch = re.match(r'App: ([^, ]*) \(INTO\)', line) #check if this line matched your second regex
            if secondReMatch and not firstReMatch: #"not" to capture the inverse match
                print line #print the line.

根据您的数据，您可能需要use re.search() instead of re.match()

【讨论】：

表达式：firstReMatch = re.match('App: ([^, ]*) (INTO).*Stack: .*\1.*$', line) 在 python 中不起作用它在 grep 中的工作方式。在 python (/s) 中使用正则表达式时需要注意一些空间。正如您所指出的，我试图使用 puthon;s re.findall() 获得相同的输出，但由于感到困惑而无法找到任何解决方案。使用 grep 可能需要子进程调用，但我认为使用 regix 可以解决这个问题。只是无法一针见血。
@learning 我不明白。您无法将我答案中的代码转换为您想要的代码？顺便说一句，我意识到我没有使用原始字符串 (r"this is a raw string with the r in front")。我已经更新了我的答案。
我使用 re.findall 完成了这项工作。感谢牛的帮助。