【问题标题】:Regex Pattern where group may not exist组可能不存在的正则表达式模式
【发布时间】:2015-07-07 09:30:06
【问题描述】:

我有一个 RegEx 模式需要匹配以下任何一行:

10-10-15 15:16:41.1 Some Text here 
10-10-15 15:16:41.12 Some Text here 
10-10-15 15:16:41.123 Some Text here 
10-10-15 15:16:41 Some Text here 

我可以用下面的模式匹配前 3 个:

(?<date>(?<day>\d{1,2})-(?<month>\d{1,2})-(?<year>(?:\d{4}|\d{2}))\s(?<time>(?<hour>\d{2}):(?<minutes>\d{2}):(?<seconds>\d{2})\.(?<milli>\d{0,3})))\s(?<Line>.*)

我如何匹配这一行(此处为 10-10-15 15:16:41 Some Text),它没有毫秒但仍将组返回到我的结果中,无论是空白值还是值为 0?

谢谢

正如我所说,下面的每一行都会匹配:

10-10-15 15:16:41.123 Some text Here
10-10-15 15:16:41.12 Some Text here 
10-10-15 15:16:41.1 Some Text here 
10-10-15 15:16:41. Some Text here 

这些组看起来像这样:

date    [0-18]  `10-10-15 15:16:41.`
day     [0-2]   `10`
month   [3-5]   `10`
year    [6-8]   `15`
time    [9-18]  `15:16:41.`
hour    [9-11]  `15`
minutes [12-14] `16`
seconds [15-17] `41`
milli   [18-18] ``
Line    [19-34] `Some Text here `

【问题讨论】:

  • 基本上我需要做的是使 Milli 组成为该模式的可选组,但仍以其值或默认值存在于结果组输出中?
  • 将点和毫包含在一个非捕获组中并使其成为可选。
  • 您使用的是特定风格的正则表达式吗?提供的模式与提供的任何文本示例都不匹配,并且它包含一些通常无效的语法,例如捕获组开头的问号(?...
  • @gfullam 一个有效的问题,但在 OP 的辩护中,许多正则表达式引擎支持命名组((?&lt;...&gt;...) 就是这样)。
  • 其实明白了。我需要以下模式: (?(?\d{1,2})-(?\d{1,2})-(?(?:\d{ 4}|\d{2}))\s(?(?\d{2}):(?\d{2}):(?\d{2 })(?\.?\d{0,3})))\s(?.*)

标签: regex


【解决方案1】:

将毫秒设为可选?

/^([\d]{2})-([\d]{2})-([\d]{2}|[\d]{4})\s+([\d]{2}):([\d]{2}):([\d]{2})\.?(\d+)?\s+(.*?)$/

例子:

<?php

$strings = <<< LOL
10-10-15 15:16:41.1 Some Text here 
10-10-15 15:16:41.12 Some Text here 
10-10-15 15:16:41.123 Some Text here 
10-10-15 15:16:41 Some Text here 
LOL;

preg_match_all('/^([\d]{2})-([\d]{2})-([\d]{2}|[\d]{4})\s+([\d]{2}):([\d]{2}):([\d]{2})\.?(\d+)?\s+(.*?)$/m', $strings , $matches, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($matches[0]); $i++) {

    $day = $matches[1][$i];
    $month = $matches[2][$i];
    $year = $matches[3][$i];
    $hours = $matches[4][$i];
    $minutes = $matches[5][$i];
    $seconds = $matches[6][$i];
    $ms = $matches[7][$i];
    $text = $matches[8][$i];


    echo "$day $month $year $hours $minutes $seconds $ms $text \n";
}

正则表达式演示:

https://regex101.com/r/aF9wN6/1


PHP 演示:

http://ideone.com/1aEt2E


正则表达式解释:

^([\d]{2})-([\d]{2})-([\d]{2}|[\d]{4})\s+([\d]{2}):([\d]{2}):([\d]{2})\.?(\d+)?\s+(.*?)$

Assert position at the beginning of a line (at beginning of the string or after a line break character) (line feed) «^»
Match the regex below and capture its match into backreference number 1 «([\d]{2})»
   Match a single character that is a “digit” (any decimal number in any Unicode script) «[\d]{2}»
      Exactly 2 times «{2}»
Match the character “-” literally «-»
Match the regex below and capture its match into backreference number 2 «([\d]{2})»
   Match a single character that is a “digit” (any decimal number in any Unicode script) «[\d]{2}»
      Exactly 2 times «{2}»
Match the character “-” literally «-»
Match the regex below and capture its match into backreference number 3 «([\d]{2}|[\d]{4})»
   Match this alternative (attempting the next alternative only if this one fails) «[\d]{2}»
      Match a single character that is a “digit” (any decimal number in any Unicode script) «[\d]{2}»
         Exactly 2 times «{2}»
   Or match this alternative (the entire group fails if this one fails to match) «[\d]{4}»
      Match a single character that is a “digit” (any decimal number in any Unicode script) «[\d]{4}»
         Exactly 4 times «{4}»
Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, form feed) «\s+»
   Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regex below and capture its match into backreference number 4 «([\d]{2})»
   Match a single character that is a “digit” (any decimal number in any Unicode script) «[\d]{2}»
      Exactly 2 times «{2}»
Match the character “:” literally «:»
Match the regex below and capture its match into backreference number 5 «([\d]{2})»
   Match a single character that is a “digit” (any decimal number in any Unicode script) «[\d]{2}»
      Exactly 2 times «{2}»
Match the character “:” literally «:»
Match the regex below and capture its match into backreference number 6 «([\d]{2})»
   Match a single character that is a “digit” (any decimal number in any Unicode script) «[\d]{2}»
      Exactly 2 times «{2}»
Match the character “.” literally «\.?»
   Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match the regex below and capture its match into backreference number 7 «(\d+)?»
   Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
   Match a single character that is a “digit” (any decimal number in any Unicode script) «\d+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, form feed) «\s+»
   Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regex below and capture its match into backreference number 8 «(.*?)»
   Match any single character that is NOT a line break character (line feed) «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Assert position at the end of a line (at the end of the string or before a line break character) (line feed) «$»

【讨论】:

  • \.?(\d+)? 应该是 (?:\.(\d*))?[\d] 而不是 \d 是什么意思?
  • 如果毫秒是可选的,那么如果不匹配则不会返回该组。
  • 但是如果有毫秒那么.是可选的吗?
  • @Biffen sry,点 . 在毫秒组之外,因为 OP 不需要它,它也是可选的。
  • @Biffen 这是由于从正则表达式的第一部分复制+粘贴。 tks 指出这一点。
【解决方案2】:
^(\d+)-(\d+)-(\d+)\s(\d+):(\d+):(\d+)\.?(\d*)([a-zA-Z\s]+)

注意(\d*),即使为空也会返回组。

Demo

【讨论】:

  • (\d+|.{0}) 可以缩短为(\d+|) 甚至(\d*)
  • 这将匹配 10-10-15 15:16:41123 Some text Here,我不确定 OP 是否想要。
【解决方案3】:

解决了。我需要以下模式:

(?<date>(?<day>\d{1,2})-(?<month>\d{1,2})-(?<year>(?:\d{4}|\d{2}))\s(?<time>(?<hour>\d{2}):(?<minutes>\d{2}):(?<seconds>\d{2})(?<milli>\.?\d{0,3})))\s(?<logEntry>.*)

【讨论】:

  • 您的正则表达式仍然无法按预期的值工作.. 它会匹配 10-10-15 15:16:41345 Some Text here DEMO 也.. 而它不应该.. 检查我的答案:)
  • 干杯刚刚看到。如果您能提供帮助,我刚刚在 cmets 中添加了另一个问题以供您回答?谢谢
【解决方案4】:

您可以使用以下(您的正则表达式稍作修改的版本):

(?<date>(?<day>\d{1,2})-(?<month>\d{1,2})-(?<year>(?:\d{4}|\d{2}))\s(?<time>(?<hour>\d{2}):(?<minutes>\d{2}):(?<seconds>\d{2})(?<milli>\.\d{0,3})?))\s(?<logEntry>.*)

DEMO

解释:

  • 使&lt;milli&gt; 部分可选.. 而不是.,因为它也匹配像10-10-15 15:16:41123 Some Text here 这样的字符串..

【讨论】:

  • 谢谢,这比我的方法好得多。另一个问题是,如果毫秒不存在,则 Milli 组不存在 - 如果值不存在,是否有办法为该组提供默认值?
  • 是的......也许......这将取决于您使用的语言:)
  • (?&lt;milli&gt;\.\d{0,3})?(?&lt;milli&gt;(?:\.\d{0,3})?) 应该这样做。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2016-07-14
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多