正则表达式查找特定单词并合并以下两行答案

【问题标题】：Regex to find a specific word and merge the following two lines正则表达式查找特定单词并合并以下两行
【发布时间】：2017-09-29 23:00:15
【问题描述】：

我有一个 C# 应用程序，我在其中读取了一个如下所示的 .txt 文件：

列表项
列表项
帐户
号码
五个
列表项
列表项
帐户
号码
六个
列表项

我需要一个正则表达式来查找特定单词“Account”，然后合并以下两行以获得结果

帐号五
帐号六

我在第一行有以下正则表达式，但如何合并以下两行？

[\n\r].*Account\s*([^\n]*)

【问题讨论】：

您应该有一个选项来启用多行正则表达式（称为g 选项，但这取决于 C# API）和多匹配。关于您的正则表达式，您应该使用Account\s*(?:([^\r\n]*)\r\n){2} 之类的替换模式Account \1 \2。请务必正确转义反斜杠并在 .txt 文件中使用 CRLF \r\n 行尾。
文本文件字面意思是这样的吗？有子弹在一条线上？改为引用部分。

标签： c# regex

【解决方案1】：

不确定，是否可以使用一个正则表达式。你可以用两个来实现。一个用于匹配，另一个用于用空格替换换行符

var regex = new Regex(@"Account\r\n\w*\r\n\w*");
var regex_newline = new Regex("(\r\n|\r|\n)");
var matches = regex.Matches(input);
foreach(var match in matches)
{
    Console.WriteLine(regex_newline.Replace(match.ToString(), " "));
};

【讨论】：

【解决方案2】：

如果可以的话，我会避免使用\r\n 和类似的硬编码字符。下面的示例对我有用。

    static void Main() {
        var str = @"List item 1
List item 2
Account
Number
Five
List item 3
List item 4
Account
Number
Six
List item 5";

        var newStr = Regex.Replace(str, @"^\s*(Account)\s*^\s*(.*?)\s*$\s*^\s*(.*?)\s*$", "$1 $2 $3", RegexOptions.Multiline | RegexOptions.Singleline);
        Console.WriteLine($"Original: \r\n{str}\r\n---------------\r\n");
        Console.WriteLine($"New: \r\n{newStr}\r\n---------------\r\n");
    }

下面是它的输出

Original:
List item 1
List item 2
Account
Number
Five
List item 3
List item 4
Account
Number
Six
List item 5
---------------

New:
List item 1
List item 2
Account Number Five
List item 3
List item 4
Account Number Six
List item 5
---------------

正则表达式解释：

^\s*(Account)\s*        - Match from start of line followed by Account. If there are white spaces around account, then eat them up too.
^\s*(.*?)\s*$\s*        - Match from start of line, followed by optional white-spaces, followed by capturing all text on that line, followed by optional white-spaces, and then end-of-line. The last \s* eats up the end-of-line character(s)
^\s*(.*?)\s*$           - Same as above explanation, except that we don't want to eat up the end-of-line character(s) at the end

替换：

"$1 $2 $3"              - the 3 items we captured in the above regex with a space in between them.

正则表达式选项：

MultiLine               - ^ and $ character will match beginning and end of any line and not just the start and end of the string

【讨论】：