Grep 用于多个字符串和多个字符串，包括以下行答案

【问题标题】：Grep for multiple strings and multiple strings including the following lineGrep 用于多个字符串和多个字符串，包括以下行
【发布时间】：2018-11-20 14:34:12
【问题描述】：

我正在尝试为字符串 a、b 和 c grep 3 个字段。我知道这可以通过

grep -E 'a|b|c'

但是，我还想对字符串 x、y 和 z 进行 grep，包括以下行。我知道这可以通过

grep -A1 'x'

所以我的问题是，是否可以将所有这些组合成一个命令？例如。类似的东西（我知道这个命令不起作用，只是一个例子）

grep -E 'a|b|c' -A1 'x|y|z'

如果有没有 grep 的更好方法，或者甚至使用 python 会有所帮助，我只是求助于使用 grep，因为我认为它比使用 python 逐行读取文件更快。干杯！

编辑：所以我有一个包含重复部分的大文件，它看起来像这样：

{
    "source_name": [
        "$name"
    ],
    "source_line": [
        52
    ],
    "source_column": [
        1161
    ],
    "source_file": [
        "/somerandomfile"
    ],
    "sink_name": "fwrite",
    "sink_line": 55,
    "sink_column": 1290,
    "sink_file": "/somerandomfile",
    "vuln_name": "vuln",
    "vuln_cwe": "CWE_862",
    "vuln_id": "17d99d109da8d533428f61c430d19054c745917d0300b8f83db4381b8d649d83",
    "vuln_type": "taint-style"
}

{} 之间的这一部分在文件中重复。所以我要grep的是source_name、source_line和source_file下面的行以及vuln_name、sink_file和sink_line。所以样本输出应该是：

    "source_name": [
        "$name"
    "source_line": [
        52
    "source_file": [
        "/somerandomfile"
    "sink_line": 55,
    "sink_file": "/somerandomfile",
    "vuln_name": "vuln",

【问题讨论】：

为什么需要组合这些命令？
@JonahBishop 通过让输出彼此跟随而不是被拆分，让我的生活变得更轻松。如果这有任何意义
试试grep -Poz 'a|b|c|(x|y|z).*\R.*' file

标签： python regex grep

【解决方案1】：

这个 python 脚本应该能够完成这项工作，并且它允许一些难以进入密集 grep 命令的临时自定义：

my_grep.py

import re
import sys

first = re.compile(sys.argv[1])
second = re.compile(sys.argv[2])
with open(sys.argv[3]) as f:
  content = f.readlines()

for idx in range(len(content)):
  first_match = first.search(content[idx])
  if first_match:
    print(content[idx])
  second_match = second.search(content[idx])
  if second_match and (idx+1) < len(content):
    print(content[idx])
    print(content[idx+1])

你可以像这样生成你想要的输出：

 python my_grep.py 'sink_line|sink_file|vuln_name' 'source_name|source_line|source_file' input_file

假设您的输入文件名为input_file。

【讨论】：

这很好用，让我可以轻松地根据自己的喜好修改输出或将输出分配给变量。谢谢老兄！

【解决方案2】：

AWK

awk 支持匹配从 pattern1 到 pattern2 的所有范围模式：

awk '/(aaa|bbb|ccc)/,/[xyz]/' data.txt

Python

Python 允许您编译正则表达式以提高速度，您可以通过将脚本放入文件中将其作为单个命令调用。

import re

pattern1 = re.compile("a|b|c")
pattern2 = re.compile("x|y|z")
saw_pattern1 = False

with open("data.txt", "rb") as fin:
    for line in fin:
        if saw_pattern1 and pattern2.match(line):
            print("do stuff")
        saw_pattern1 = pattern1.match(line)

【讨论】：