从PowerShell中的shell命令中提取两个关键字之间的多行文本答案

【问题标题】：Extract multiple lines of text between two key words from shell command in powershell从PowerShell中的shell命令中提取两个关键字之间的多行文本
【发布时间】：2014-11-12 15:47:22
【问题描述】：

我有一个 shell 命令，我想使用 Powershell 提取数据。我需要的数据总是位于两个关键词之间，并且捕获的行数可以改变。

输出可能如下所示。

Sites:
System1: 
RPAs: OK
Volumes: 
  WARNING: Storage group DR_UCS_01-08 contains both replicated and unreplicated volumes. ; CS_TX
  WARNING: Storage group DR_UCS_21-28 contains both replicated and unreplicated volumes. ; CS_TX
  WARNING: Storage group DR_UCS_31-38 contains both replicated and unreplicated volumes. ; CS_TX
Splitters: OK
System2: 
RPAs: OK
Volumes: 
  WARNING: Storage group MA_UCS_1 contains both replicated and unreplicated volumes. ; CS_MA
  WARNING: Storage group MA_UCS_2 contains both replicated and unreplicated volumes. ; CS_MA
  WARNING: Storage group MA_UCS_3 contains both replicated and unreplicated volumes. ; CS_MA
Splitters: OK
WAN: OK
System: OK

我想捕获并存储到变量（或文本文件，如果更简单？）部分数据，以便稍后在脚本中重复使用。例如，我想捕获 System1: 和 System2: 之间的所有内容：

RPAs: OK
Volumes: 
  WARNING: Storage group DR_UCS_01-08 contains both replicated and unreplicated volumes. ; CS_MA
  WARNING: Storage group DR_UCS_21-28 contains both replicated and unreplicated volumes. ; CS_MA
  WARNING: Storage group DR_UCS_31-38 contains both replicated and unreplicated volumes. ; CS_MA
Splitters: OK

我一直在搞乱不同的正则表达式组合，但没有成功。我在这段代码上取得了一些成功，但它似乎无法处理警告行，而且我似乎也无法让 Out-File 使用它，只有 Write-Host 无济于事我很。

$RP = plink -l User -pw Password 192.168.1.100 "get_system_status summary=no" #extract from

$script = $RP

$in = $false

$script | %{
if ($_.Contains("System1"))
    { $in = $true }
elseif ($_.Contains("System2"))
    { $in = $false; }
elseif ($in)
    { Write-Host $_ }
}

理想情况下，我希望能够使用此脚本并使用它来解析来自任何 shell 命令的数据。我目前迷路了，几乎准备放弃。

【问题讨论】：

标签： regex shell parsing powershell

【解决方案1】：

一种选择是用换行符加入文本，然后使用 -split 和多行正则表达式：

$text = 
(@'
Sites:
System1: 
RPAs: OK
Volumes: 
  WARNING: Storage group DR_UCS_01-08 contains both replicated and unreplicated volumes. ; CS_TX
  WARNING: Storage group DR_UCS_21-28 contains both replicated and unreplicated volumes. ; CS_TX
  WARNING: Storage group DR_UCS_31-38 contains both replicated and unreplicated volumes. ; CS_TX
Splitters: OK
System2: 
RPAs: OK
Volumes: 
  WARNING: Storage group MA_UCS_1 contains both replicated and unreplicated volumes. ; CS_MA
  WARNING: Storage group MA_UCS_2 contains both replicated and unreplicated volumes. ; CS_MA
  WARNING: Storage group MA_UCS_3 contains both replicated and unreplicated volumes. ; CS_MA
Splitters: OK
WAN: OK
System: OK
'@).split("`n") |
foreach {$_.trim()} 

$text -join "`n" -split '(?ms)(?=^System\d+:\s*)' -match '^System\d+:'

System1:
RPAs: OK
Volumes:
WARNING: Storage group DR_UCS_01-08 contains both replicated and unreplicated volumes. ; CS_TX
WARNING: Storage group DR_UCS_21-28 contains both replicated and unreplicated volumes. ; CS_TX
WARNING: Storage group DR_UCS_31-38 contains both replicated and unreplicated volumes. ; CS_TX
Splitters: OK

System2:
RPAs: OK
Volumes:
WARNING: Storage group MA_UCS_1 contains both replicated and unreplicated volumes. ; CS_MA
WARNING: Storage group MA_UCS_2 contains both replicated and unreplicated volumes. ; CS_MA
WARNING: Storage group MA_UCS_3 contains both replicated and unreplicated volumes. ; CS_MA
Splitters: OK
WAN: OK
System: OK

编辑：一个更通用的解决方案，只捕获两个特定关键字之间的输出：

$regex = '(?ms)System1:(.+?)System2:'

$text = $text -join "`n"

$OutputText = 
[regex]::Matches($text,$regex) |
 foreach {$_.groups[1].value -split }

【讨论】：

也许我误解了你的答案。假设我的 shell 输出存储在 $text 中，你能告诉我在两个关键字之间提取数据的代码吗？它不会总是 system1 和 system2
好的，我想我误解了这个问题。用更通用的解决方案更新了答案。
这看起来正是我要找的！我是否可以将该输出存储到新字符串中，例如 $output。
当然。只需将其分配给一个变量（示例更新答案）。请注意，如果文本中有多个关键字实例，则结果将是一个字符串数组（每组关键字一个）。
快速跟进（也许应该是一个新问题？）。当我对该变量执行 Out-File 时，我似乎失去了所有格式，并且都在 1 行。是否可以维护新线路？

【解决方案2】：

试试这个正则表达式：

$result = ($text | Select-String 'System1:\s*\r\n((.*\r\n)*)\s*System2:' -AllMatches)
$result.Matches[0].Groups[1].Value

其中 $text 是您的原始输入。请注意，您可能需要根据您的输入将行尾从 \r\n 调整为 \n。你可能还有不止一场比赛，我不确定你的样本。

正则表达式从System1:\s*\r\n 开始匹配，即 System1 后跟任意数量的空格，然后是换行符。它以文字 System2: 结束匹配。中间的.*\r\n 匹配所有后跟换行符的字符。外部中间(.*\r\n)* 表示重复匹配该模式。最后对该构造进行分组，((.*\r\n)*)，以便可以提取所有匹配的行作为结果。

【讨论】：

这将返回一个错误：'code'Cannot index into an null array。在 line:6 char:1 + $result.Matches[0].Groups[1].Value + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~ + CategoryInfo : InvalidOperation: (:) [], RuntimeException + FullyQualifiedErrorId : NullArray)
您确认换行格式了吗？检查 $result.Matches 和 $result.Matches[0]。我用你的样品测试了它。
我尝试将 \r\n 切换到 \n 但仍然得到同样的错误。
shell输出的图片会更有用吗？
没有。 $text 中的任何内容都与您的示例不同。我将其复制到文本文件“input.txt”中，然后使用“$text = gc -raw input.txt”将其读入。 $text 是字符串还是字符串数组？如果是字符串，是否有换行符？

【解决方案3】：

我试图为自己调整这个脚本，我也想做同样的事情，但要捕捉和之间的内容（来自 kobo-reader 的注释文件）。终于让它工作了，它看起来像这样：

$text = @"
<text>The deaths I see are frequently undignified; the dying very often have not accepted or understood their situation, the truth denied them by well-intentioned relatives and doctors. Their death has been stolen from them.
</text>
            </fragment>
        </target>
        <content>
                <text>It is indeed impossible to imagine our own death; and whenever we attempt to do so, we can perceive that we are in fact still present as </text>
"@
$regex = '(?ms)<text>(.+?)</text>'

#Test
$OutputText = [regex]::Matches($text,$regex) | 
foreach {$_.groups[1].value }
Write-Host $OutputText

#Output
[regex]::Matches($text,$regex) | 
foreach {$_.groups[1].value } |
Out-File c:\temp\kobo\example_out.txt -Encoding utf8

【讨论】：