【问题标题】:PHP preg_match_all() - What's wrong with my regex?PHP preg_match_all() - 我的正则表达式有什么问题?
【发布时间】:2011-03-24 17:42:26
【问题描述】:

这是一个示例字符串:

---------
SAY WHAAAAT
MEDICS:
CREW ID: PMD205304 CREW MEMBER ROLE: PRIMARY PATIENT CAREGIVER CREW MEMBER LEVEL: EMT-PARAMEDIC
CREW ID: EMT530755 CREW MEMBER ROLE: OTHER CREW MEMBER LEVEL: EMT-BASIC

这是执行 preg_match_all() 并将 $matches 数组转换为更有用的数组的函数:

      private function getMedics(){
        if(isset($this->record->elements["E04"])){
              //REGEX:
              $ptn = "/(?:CREW ID: (.+?) )*(?:CREW MEMBER ROLE: (.+?)\s+)*(?:CREW MEMBER LEVEL: (.+))?\n/";
              $str = $this->record->incidentRow['Narrative']; //Column where medic info is stored in CodeZoneIncidents table
              preg_match_all($ptn,$str,$matches);
              foreach($matches as $key => $val){
                  foreach($matches[$key] as $key2 => $val2){
                      if(trim($val2) != ""){
                          $tmp[$key2]['ID'] = $matches[1][$key2];
                          $tmp[$key2]['role'] = $matches[2][$key2];
                          $tmp[$key2]['level'] = $matches[3][$key2];
                      }
                  }
              }
              $ii = 0;
              foreach($tmp as $key => $val){
                  $CZMedics[$ii]['ID'] = $tmp[$key]['ID'];
                  $CZMedics[$ii]['role'] = $tmp[$key]['role'];
                  $CZMedics[$ii]['level'] = $tmp[$key]['level'];
                  $ii++;
              } //REGEX pattern

              $iterations = $this->eleQTY($this->record->elements["E04"]); //Return how many E04 there are
              for($i=0; $i<$iterations; $i++){
                    //[E04][0] if there are multiples:
                    $tmpEle = (isset($this->record->elements["E04"][$i])?$this->record->elements["E04"][$i]:$this->record->elements["E04"]);
                    //Populate Actual values:
                    if(isset($tmpEle["E04_01"]->code)){
                          $tmpEle["E04_01"]->actual = fncIsSet($CZMedics[$i]['ID']); //Medic ID
                          $tmpEle["E04_01"]->CZCellName = "Narrative_Box"; //CZPopUp Box
                    }
                    if(isset($tmpEle["E04_02"]->code)){
                          $tmpEle["E04_02"]->actual = fncIsSet($CZMedics[$i]['role']); //Role
                          $tmpEle["E04_02"]->CZCellName = "Narrative_Box"; //CZPopUp Box
                    }
                    if(isset($tmpEle["E04_03"]->code)){
                          $tmpEle["E04_03"]->actual = fncIsSet($CZMedics[$i]['level']); //Level
                          $tmpEle["E04_03"]->CZCellName = "Narrative_Box"; //CZPopUp Box
                    }
              }
              echo "<pre style='display:none;'>!!!";
              print_r($CZMedics);
              echo "</pre>";
        }       
  }

这是生成的数组:

Array
(
    [0] => Array
        (
            [ID] => PMD205304
            [role] => PRIMARY PATIENT CAREGIVER
            [level] => EMT-PARAMEDIC
        )

[1] => Array
    (
        [ID] => 
        [role] => 
        [level] => 
    )

)

所以我想要返回所有的军医信息(ID、角色和级别),但我不希望模式依赖于那里的任何一条信息 - 所以它应该返回军医,如果这些数据点中的任何一个都存在。

【问题讨论】:

  • 只需在你的正则表达式中去掉换行符,即尾随\n。您的最后一个数据行之后不一定有换行符,似乎是您的情况。
  • Array ( [0] =&gt; Array ( [ID] =&gt; PMD205304 [role] =&gt; PRIMARY [level] =&gt; ) [1] =&gt; Array ( [ID] =&gt; [role] =&gt; [level] =&gt; EMT-PARAMEDIC ) [2] =&gt; Array ( [ID] =&gt; EMT530755 [role] =&gt; OTHER [level] =&gt; EMT-BASIC ) )
  • 这就是我得到的,它现在认为有3个医生

标签: php regex arrays preg-match-all


【解决方案1】:

您几乎可以通过使用命名捕获组来消除手动循环分配:

preg_match_all('~^(?=CREW)(CREW ID: (?P<id>\w+))?\s*(CREW MEMBER ROLE: (?<role>.*?))?\s*(CREW MEMBER LEVEL: (?<level>.*?))?$~mi', $text, $match, PREG_SET_ORDER);

这会导致很多多余的条目,但是[id][role][level]已经被分开了(当然你可以再次添加?:以减少混乱):

[0] => Array
    (
        [0] => CREW ID: PMD205304 CREW MEMBER ROLE: PRIMARY PATIENT CAREGIVER CREW MEMBER LEVEL: EMT-PARAMEDIC
        [1] => CREW ID: PMD205304
        [id] => PMD205304
        [2] => PMD205304
        [3] =>  CREW MEMBER ROLE: PRIMARY PATIENT CAREGIVER
        [role] => PRIMARY PATIENT CAREGIVER
        [4] => PRIMARY PATIENT CAREGIVER
        [5] =>  CREW MEMBER LEVEL: EMT-PARAMEDIC
        [level] => EMT-PARAMEDIC
        [6] => EMT-PARAMEDIC
    )

【讨论】:

  • 如果所有值都是可选的,这会起作用吗? ID、角色和级别 - 只需要一个(任何一个),无法确定是否提供了任何字段。
  • 是的。但是每个组后面都有一个?,所以它可以返回空的结果块。您必须检查是否存在[id], [role] or [level]
  • 可能是空格。在组之间移动\s+,改为\s*
【解决方案2】:

这应该可以解决问题:

^CREW ID: (.*) CREW MEMBER ROLE: (.*) CREW MEMBER LEVEL: (.*)$

至少它适用于您提供的示例。但是“.+?”字段之间的星号“*”可能意味着您希望它们是可选的,或者您希望允许多个字段。所以也许你需要提供更多的例子......

顺便说一句:如果要确保行完全匹配,请使用 ^$。并激活允许 ^$ 匹配换行符的选项。我宁愿不使用“\n”

【讨论】:

  • 您至少需要使用 .*?否则它将是贪婪的并匹配下一个字段的标签。
  • 是的,所有三个数据点都是可选的 - 只需要一个。
猜你喜欢
  • 2013-02-12
  • 2019-09-10
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2013-09-13
相关资源
最近更新 更多