如何在 php 正则表达式中捕获可选单词答案

【问题标题】：How to capture optional word in php regex如何在 php 正则表达式中捕获可选单词
【发布时间】：2018-04-06 19:39:16
【问题描述】：

内容结构如下：

$contents = '1234    FIRSTNAME   LASTNAME    M     4321
1345    LASTNAME    F     4621
8223    FIRSTNAME   LASTNAME    M     4256;

我只想提取数组中的名字或姓氏，如下所示：

Array ( [0] => FIRSTNAME LASTNAME,
[1] => LASTNAME )

我的代码：

<?php

$contents = '1234    FIRSTNAME   LASTNAME    M     4321
1345    LASTNAME    F     4621
8223    FIRSTNAME   LASTNAME    M     4256';

$res = preg_replace('/([A-Z]{2,24})\s+([A-Z]{2,24})/', '$1 $2', $contents);


preg_match_all('/([A-Z]{2,24}?\s[A-Z]{2,24})/', $res, $result);

print_r($result[1]);

【问题讨论】：

这看起来像一个制表符分隔的文件，这是由计算机生成的吗？像3v4l.org/CrjP9 这样的东西可以工作，但如果它被分隔我会推荐php.net/manual/en/function.fgetcsv.php
另外1234 FIRSTNAME LASTNAME M 4321 1345 LASTNAME F 4621' 8223 FIRSTNAME LASTNAME M 4256; 无效，这是示例还是您的实际代码？
除了说cmets，你不需要preg_replace。去一个preg_match_all：preg_match_all('/[A-Z]{2,24}(?:\s+[A-Z]{2,24})?/', $contents, $result);
我不小心添加了引用，现在删除了
感谢@revo，完美运行。我只是在寻找可选匹配。你做得很好

标签： php regex preg-replace preg-match preg-match-all

【解决方案1】：

您可以仅使用 preg_match_all 函数使用以下正则表达式：

'~\b[A-Z]{2,}(?:\h+[A-Z]{2,})?\b~'

请参阅regex demo。

详情

\b - 字边界
[A-Z]{2,} - 2 个或更多大写 ASCII 字母（替换为 \p{Lu} 并使用 u 修饰符匹配所有 Unicode 大写字母
(?:\h+[A-Z]{2,})? - 可选序列
- \h+ - 1+ 个水平空格（似乎姓氏和名字总是在一行上）
- [A-Z]{2,} - 2 个或更多大写 ASCII 字母
\b - 字边界。

见PHP demo：

$contents = '1234    FIRSTNAME   LASTNAME    M     4321
1345    LASTNAME    F     4621
8223    FIRSTNAME   LASTNAME    M     4256';
if (preg_match_all('/\b[A-Z]{2,}(?:\h+[A-Z]{2,})?\b/', $contents, $result)) {
    print_r($result[0]);
}

输出：

Array
(
    [0] => FIRSTNAME   LASTNAME
    [1] => LASTNAME
    [2] => FIRSTNAME   LASTNAME
)

【讨论】：