【问题标题】:Extract multiple lines of text between two key words from shell command in powershell从PowerShell中的shell命令中提取两个关键字之间的多行文本
【发布时间】:2014-11-12 15:47:22
【问题描述】:

我有一个 shell 命令,我想使用 Powershell 提取数据。我需要的数据总是位于两个关键词之间,并且捕获的行数可以改变。

输出可能如下所示。

Sites:
System1: 
RPAs: OK
Volumes: 
  WARNING: Storage group DR_UCS_01-08 contains both replicated and unreplicated volumes. ; CS_TX
  WARNING: Storage group DR_UCS_21-28 contains both replicated and unreplicated volumes. ; CS_TX
  WARNING: Storage group DR_UCS_31-38 contains both replicated and unreplicated volumes. ; CS_TX
Splitters: OK
System2: 
RPAs: OK
Volumes: 
  WARNING: Storage group MA_UCS_1 contains both replicated and unreplicated volumes. ; CS_MA
  WARNING: Storage group MA_UCS_2 contains both replicated and unreplicated volumes. ; CS_MA
  WARNING: Storage group MA_UCS_3 contains both replicated and unreplicated volumes. ; CS_MA
Splitters: OK
WAN: OK
System: OK

我想捕获并存储到变量(或文本文件,如果更简单?)部分数据,以便稍后在脚本中重复使用。例如,我想捕获 System1: 和 System2: 之间的所有内容:

RPAs: OK
Volumes: 
  WARNING: Storage group DR_UCS_01-08 contains both replicated and unreplicated volumes. ; CS_MA
  WARNING: Storage group DR_UCS_21-28 contains both replicated and unreplicated volumes. ; CS_MA
  WARNING: Storage group DR_UCS_31-38 contains both replicated and unreplicated volumes. ; CS_MA
Splitters: OK

我一直在搞乱不同的正则表达式组合,但没有成功。我在这段代码上取得了一些成功,但它似乎无法处理警告行,而且我似乎也无法让 Out-File 使用它,只有 Write-Host 无济于事我很。

$RP = plink -l User -pw Password 192.168.1.100 "get_system_status summary=no" #extract from

$script = $RP

$in = $false

$script | %{
if ($_.Contains("System1"))
    { $in = $true }
elseif ($_.Contains("System2"))
    { $in = $false; }
elseif ($in)
    { Write-Host $_ }
}

理想情况下,我希望能够使用此脚本并使用它来解析来自任何 shell 命令的数据。我目前迷路了,几乎准备放弃。

【问题讨论】:

    标签: regex shell parsing powershell


    【解决方案1】:

    一种选择是用换行符加入文本,然后使用 -split 和多行正则表达式:

    $text = 
    (@'
    Sites:
    System1: 
    RPAs: OK
    Volumes: 
      WARNING: Storage group DR_UCS_01-08 contains both replicated and unreplicated volumes. ; CS_TX
      WARNING: Storage group DR_UCS_21-28 contains both replicated and unreplicated volumes. ; CS_TX
      WARNING: Storage group DR_UCS_31-38 contains both replicated and unreplicated volumes. ; CS_TX
    Splitters: OK
    System2: 
    RPAs: OK
    Volumes: 
      WARNING: Storage group MA_UCS_1 contains both replicated and unreplicated volumes. ; CS_MA
      WARNING: Storage group MA_UCS_2 contains both replicated and unreplicated volumes. ; CS_MA
      WARNING: Storage group MA_UCS_3 contains both replicated and unreplicated volumes. ; CS_MA
    Splitters: OK
    WAN: OK
    System: OK
    '@).split("`n") |
    foreach {$_.trim()} 
    
    $text -join "`n" -split '(?ms)(?=^System\d+:\s*)' -match '^System\d+:'
    
    System1:
    RPAs: OK
    Volumes:
    WARNING: Storage group DR_UCS_01-08 contains both replicated and unreplicated volumes. ; CS_TX
    WARNING: Storage group DR_UCS_21-28 contains both replicated and unreplicated volumes. ; CS_TX
    WARNING: Storage group DR_UCS_31-38 contains both replicated and unreplicated volumes. ; CS_TX
    Splitters: OK
    
    System2:
    RPAs: OK
    Volumes:
    WARNING: Storage group MA_UCS_1 contains both replicated and unreplicated volumes. ; CS_MA
    WARNING: Storage group MA_UCS_2 contains both replicated and unreplicated volumes. ; CS_MA
    WARNING: Storage group MA_UCS_3 contains both replicated and unreplicated volumes. ; CS_MA
    Splitters: OK
    WAN: OK
    System: OK
    

    编辑:一个更通用的解决方案,只捕获两个特定关键字之间的输出:

    $regex = '(?ms)System1:(.+?)System2:'
    
    $text = $text -join "`n"
    
    $OutputText = 
    [regex]::Matches($text,$regex) |
     foreach {$_.groups[1].value -split }
    

    【讨论】:

    • 也许我误解了你的答案。假设我的 shell 输出存储在 $text 中,你能告诉我在两个关键字之间提取数据的代码吗?它不会总是 system1 和 system2
    • 好的,我想我误解了这个问题。用更通用的解决方案更新了答案。
    • 这看起来正是我要找的!我是否可以将该输出存储到新字符串中,例如 $output。
    • 当然。只需将其分配给一个变量(示例更新答案)。请注意,如果文本中有多个关键字实例,则结果将是一个字符串数组(每组关键字一个)。
    • 快速跟进(也许应该是一个新问题?)。当我对该变量执行 Out-File 时,我似乎失去了所有格式,并且都在 1 行。是否可以维护新线路?
    【解决方案2】:

    试试这个正则表达式:

    $result = ($text | Select-String 'System1:\s*\r\n((.*\r\n)*)\s*System2:' -AllMatches)
    $result.Matches[0].Groups[1].Value
    

    其中 $text 是您的原始输入。请注意,您可能需要根据您的输入将行尾从 \r\n 调整为 \n。你可能还有不止一场比赛,我不确定你的样本。

    正则表达式从System1:\s*\r\n 开始匹配,即 System1 后跟任意数量的空格,然后是换行符。它以文字 System2: 结束匹配。中间的.*\r\n 匹配所有后跟换行符的字符。外部中间(.*\r\n)* 表示重复匹配该模式。最后对该构造进行分组,((.*\r\n)*),以便可以提取所有匹配的行作为结果。

    【讨论】:

    • 这将返回一个错误:'code'Cannot index into an null array。在 line:6 char:1 + $result.Matches[0].Groups[1].Value + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~ + CategoryInfo : InvalidOperation: (:) [], RuntimeException + FullyQualifiedErrorId : NullArray)
    • 您确认换行格式了吗?检查 $result.Matches 和 $result.Matches[0]。我用你的样品测试了它。
    • 我尝试将 \r\n 切换到 \n 但仍然得到同样的错误。
    • shell输出的图片会更有用吗?
    • 没有。 $text 中的任何内容都与您的示例不同。我将其复制到文本文件“input.txt”中,然后使用“$text = gc -raw input.txt”将其读入。 $text 是字符串还是字符串数组?如果是字符串,是否有换行符?
    【解决方案3】:

    我试图为自己调整这个脚本,我也想做同样的事情,但要捕捉 之间的内容(来自 kobo-reader 的注释文件)。终于让它工作了,它看起来像这样:

    $text = @"
    <text>The deaths I see are frequently undignified; the dying very often have not accepted or understood their situation, the truth denied them by well-intentioned relatives and doctors. Their death has been stolen from them.
    </text>
                </fragment>
            </target>
            <content>
                    <text>It is indeed impossible to imagine our own death; and whenever we attempt to do so, we can perceive that we are in fact still present as </text>
    "@
    $regex = '(?ms)<text>(.+?)</text>'
    
    #Test
    $OutputText = [regex]::Matches($text,$regex) | 
    foreach {$_.groups[1].value }
    Write-Host $OutputText
    
    #Output
    [regex]::Matches($text,$regex) | 
    foreach {$_.groups[1].value } |
    Out-File c:\temp\kobo\example_out.txt -Encoding utf8
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2016-08-13
      • 1970-01-01
      • 2013-05-11
      • 2014-03-12
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-10-27
      相关资源
      最近更新 更多