如何使用 csplit 根据每 X 个分隔符匹配来拆分文件

【问题标题】：How to use csplit to split a file based on every X amount of delimiter matches如何使用 csplit 根据每 X 个分隔符匹配来拆分文件
【发布时间】：2022-01-14 19:22:51
【问题描述】：

我有一个 457 MB 的文件，并试图将其拆分为更小的文件。以下是目前的工作：

csplit -z Scan.nessus /\<ReportHost/ '{*}'

但是，这为我创建了大约 61.5k，因为我在这个 457MB 的文件中有 ton 的这些条目。最终，我可能会按每 50 个条目而不是每个条目来细分。

有没有办法修改它来实现它？我在一定程度上尝试在 Ruby 中执行此操作，但尝试使用 Nokogiri 解析文件时似乎最大限度地利用了 VM 的内存。

【问题讨论】：

标签： linux bash csplit

【解决方案1】：

perl 解决方案怎么样？
即使你不熟悉perl的语法，也不会很难自定义它修改定义为的参数 my $pattern = ...等

#!/bin/bash

perl -e '
    use strict; use warnings;

    my $pattern = "<ReportHost";        # the pattern to split
    my $prefix = "xx";                  # prefix of the output file
    my $n = 50;                         # number of entries per file

    my $filename = $prefix . "000";
    my $count = 0;

    while (<>) {
        if (/$pattern/o) {              # if the pattern is found
            if ($count % $n == 0) {     # open the new file to output
                open(FH, "> $filename") or die "$filename";
                $filename++;            # increment the number of the file
            }
            $count++;
        }
        print FH;                       # print the line to the opened file
    }
' Scan.nessus                           # input filename

【讨论】：