【问题标题】:Powershell complicated regex powershell multiple groupsPowershell 复杂的正则表达式 powershell 多组
【发布时间】:2020-05-08 06:57:53
【问题描述】:

我需要一些关于我的正则表达式的帮助。

我的代码看起来像这样(我还没有走多远):

$source_file = "\\server\minified.txt"
$sf_content = gc $source_file -raw

$sections = $sf_content | select-string -AllMatches '(?smi)(^\s+\d+:\d+\s+AM\s+\w+\s+ACCOUNT ACTIVITY\s-\s)(\w+\s+\w+$)(.+?(Start Account\s\d+)(.+?Elapsed))'
$sections

文件如下所示:

我能够使用我的正则表达式从上图顶部以红色圈出的“帐户活动 - 人的姓名”字符串中获取名字和姓氏。

我的最终目标是能够将蓝色框作为匹配项进行正则表达式,获取从左上角的日期到“每小时工作 1 个帐户”的所有信息。然后我想从第二个红色圆圈中获取信息。我想在该行的开头获取开始时间,然后找到同一行“Start account 54321234”的最后一个实例,以便我可以将最后一次减去第一次。

因此,对于每个蓝色框,从红色圆圈中获取信息。对于每个包含“开始帐户”的红色圆圈,取蓝色圆圈减去绿色圆圈。

我想尝试使用正则表达式组。如果我无法弄清楚,我想将我的每个蓝色框正则表达式放入一个数组中,并且对于数组中的每个项目,我可以进一步执行正则表达式以获得我想要的。

我的代码不完整。但我不确定如何执行正则表达式,所以我会在更新脚本并进行自己的研究时不断更新它。

如果有人有指点,我将不胜感激。

这里是文本形式的源内容:

   05/07/20                                                       Acme, Inc.                                                          PAGE 1
    9:48 AM  ABC                                          ACCOUNT ACTIVITY - Bart Simpson

The time ELAPSED since the previous line is printed as HOURS:MINUTES:SECONDS.
      DATE     TIME     ELAPSED   ACTION


    04/16/20  8:06:50      0:00   Enter Account Screen
-------------------------------------------------------------------------------
              8:06:53      0:03   Start account 12345678  ROSS, BOB N
              8:07:24      0:31   Finished account in 31 seconds
-------------------------------------------------------------------------------
              8:07:26      0:02   Start account 54321234  DOE, JOHN
              8:07:27      0:01   Finished account in 1 seconds
-------------------------------------------------------------------------------
              8:07:28      0:02   Start account 54321234  DOE, JOHN
              8:10:26      0:01   Finished account in 1 seconds
-------------------------------------------------------------------------------
    05/06/20  4:55:49      5:08   Leave Account Screen     9:33 Elapsed 
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
    05/06/20  4:55:55      0:06   Leave Account Screen
-------------------------------------------------------------------------------

                                      DAILY TOTALS
                        5:33:46 - Time on Account screen for the day.
                              3 Calls             1 Calls per hour
                              3 Contacts          1 Contacts per hour
                              3 Accounts worked   1 Accounts worked per hour
   05/07/20                                                       Acme, Inc.                                                          PAGE 1
    9:48 AM  ABC                                          ACCOUNT ACTIVITY - Lisa Simpson

The time ELAPSED since the previous line is printed as HOURS:MINUTES:SECONDS.
      DATE     TIME     ELAPSED   ACTION


    04/16/20  8:06:50      0:00   Enter Account Screen
-------------------------------------------------------------------------------
              8:06:53      0:03   Start account 6543212  DOE, JANE
              8:07:24      0:31   Finished account in 31 seconds
-------------------------------------------------------------------------------
              8:07:26      0:02   Start account 88888888  DEER, JOHN
              8:07:27      1:01   Finished account in 1 seconds
-------------------------------------------------------------------------------
    05/06/20  4:55:49      5:08   Leave Account Screen    10:33 Elapsed 
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
    05/06/20  4:55:55      0:06   Leave Account Screen
-------------------------------------------------------------------------------

                                      DAILY TOTALS
                        5:33:46 - Time on Account screen for the day.
                              3 Calls             1 Calls per hour
                              3 Contacts          1 Contacts per hour
                              3 Accounts worked   1 Accounts worked per hour

【问题讨论】:

  • 我会在date.. [lots of spaces] ...company name 行上拆分以获取单独的记录。然后针对结果记录进行正则表达式。它可以大大简化你的模式。
  • 这些行在这个文件中到处都是,而且是随机的,这使得它不太有效。所以我试图一次做整个部分。
  • 此视频可能对您有所帮助Sophisitcated Techniques of Plain Text Parsing。请务必观看到最后。
  • @shadow2020 - 如果您的文本不规则,您将需要一个真正的正则表达式大师 [也许还有一些彻底的魔法] 来解析该文本。祝你好运……你可能需要它! [咧嘴一笑]
  • 我知道!谢谢你,先生!当您需要正则表达式忍者时,他们在哪里?

标签: regex powershell


【解决方案1】:

您将在使用正则表达式时遇到困难。它似乎在重复第二个捕获组。我尝试了一段时间,为您的相关匹配添加标签,而我只是使用这个正则表达式来挑选第一个匹配项。任何“正则表达式之王”的人,请移开视线。

(?smi)(^\s+\d+:\d+\s+(AM|PM)\s+\w+\s+ACCOUNT ACTIVITY\s-\s)(?<name>\w+\s+\w+$)(.+?(?<begin>\d+:\d+:\d+)(\s+\d:\d+\s+)(?<acctnumber>Start Account\s\d+)(\s+)(?<account>\w+,\s\w+(\s[A-za-z]|))\s+(?<end>.+?\d:\d+))

您可以提供一个模板来选择所有可能感兴趣的领域并使用ConvertFrom-String。关键是在大括号中唯一地标记您想要的所有项目。然后,您必须用星号标记模板中的第一项,因此使用上面的示例,您将拥有类似的内容。

$template = @"
   05/07/20                                                       Acme, Inc.                                                          PAGE 1
    9:48 AM  ABC                                          ACCOUNT ACTIVITY - {customer*:Bart Simpson}

The time ELAPSED since the previous line is printed as HOURS:MINUTES:SECONDS.
      DATE     TIME     ELAPSED   ACTION


    04/16/20  8:06:50      0:00   Enter Account Screen
-------------------------------------------------------------------------------
              {begin1:8:06:53}      0:03   {accNum1:Start account 12345678}  {name1:ROSS, BOB N}
              {end1:8:07:24}      0:31   Finished account in 31 seconds
-------------------------------------------------------------------------------
              {begin2:8:07:26}      0:02   {accNum2:Start account 54321234}  {name2:DOE, JOHN}
              {end2:8:07:27}      0:01   Finished account in 1 seconds
-------------------------------------------------------------------------------
              {begin3:8:07:28}      0:02   {accNum3:Start account 54321234}  {name3:DOE, JOHN}
              {end3:8:10:26}      0:01   Finished account in 1 seconds
-------------------------------------------------------------------------------
    05/06/20  4:55:49      5:08   Leave Account Screen     9:33 Elapsed 
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
    05/06/20  4:55:55      0:06   Leave Account Screen
-------------------------------------------------------------------------------

                                      DAILY TOTALS
                        5:33:46 - Time on Account screen for the day.
                              3 Calls             1 Calls per hour
                              3 Contacts          1 Contacts per hour
                              3 Accounts worked   1 Accounts worked per hour
   05/07/20                                                       Acme, Inc.                                                          PAGE 1
    9:48 AM  ABC                                          ACCOUNT ACTIVITY - {customer*:Lisa Simpson}

The time ELAPSED since the previous line is printed as HOURS:MINUTES:SECONDS.
      DATE     TIME     ELAPSED   ACTION


    04/16/20  8:06:50      0:00   Enter Account Screen
-------------------------------------------------------------------------------
              {begin1:8:06:53}      0:03   {accNum1:Start account 6543212}  {name1:DOE, JANE}
              {end1:8:07:24}      0:31   Finished account in 31 seconds
-------------------------------------------------------------------------------
              {begin2:8:07:26}      0:02   {accNum2:Start account 88888888}  {name2:DEER, JOHN}
              {end2:8:07:27}      1:01   Finished account in 1 seconds
-------------------------------------------------------------------------------
              {begin3:\s}      0:02   {accNum3:\s}  {name3:\s}
              {end3:\s}      1:01   Finished account in 1 seconds
-------------------------------------------------------------------------------
    05/06/20  4:55:49      5:08   Leave Account Screen    10:33 Elapsed 
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
    05/06/20  4:55:55      0:06   Leave Account Screen
-------------------------------------------------------------------------------

                                      DAILY TOTALS
                        5:33:46 - Time on Account screen for the day.
                              3 Calls             1 Calls per hour
                              3 Contacts          1 Contacts per hour
                              3 Accounts worked   1 Accounts worked per hour
"@

在您的最后一个示例中,我添加了第三组,其中包含正则表达式空间,因此它不会重复第三组中的第二组数据。

然后,您可以使用 -TemplateContent 参数通过 cmdlet 管道输入您的完整输入来应用您的模板。你应该把数据从另一边拿出来。

$data = # Get your data
$data | ConvertFrom-String -TemplateContent $template

customer : Bart Simpson
begin1   : 8:06:53
accNum1  : Start account 12345678
name1    : ROSS, BOB N
end1     : 8:07:24
begin2   : 8:07:26
accNum2  : Start account 54321234
name2    : DOE, JOHN
end2     : 8:07:27
begin3   : 8:07:28
accNum3  : Start account 54321234
name3    : DOE, JOHN
end3     : 8:10:26

customer : Lisa Simpson
begin1   : 8:06:53
accNum1  : Start account 6543212
name1    : DOE, JANE
end1     : 8:07:24
begin2   : 8:07:26
accNum2  : Start account 88888888
name2    : DEER, JOHN
end2     : 8:07:27

然后您可以比较您的数据,循环输出对象。

【讨论】:

  • 干得好!喜欢 convertfrom-string 策略。我正在努力解决这个问题。谢谢
猜你喜欢
  • 2020-07-06
  • 2021-09-27
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2012-08-01
  • 2014-03-21
相关资源
最近更新 更多