使用 Preg_match_all 从字符串中提取关键字/标签答案

【问题标题】：Extract keywords/tags from string using Preg_match_all使用 Preg_match_all 从字符串中提取关键字/标签
【发布时间】：2009-06-24 11:23:06
【问题描述】：

我有以下代码

$str = "keyword keyword 'keyword 1 and keyword 2' another 'one more'".'"another keyword" yes,one,two';

preg_match_all('/"[^"]+"|[^"\' ,]+|\'[^\']+\'/', $str, $matches);

echo "<pre>"; print_r($matches); echo "</pre>";

我希望它从字符串中提取关键字，并将它们放在单引号或双引号中，上面的代码可以正常工作，但它返回的值带有引号。我知道我可以通过 str_replace 或类似方法删除这些，但我真的在寻找一种通过 preg_match_all 函数解决这个问题的方法。

输出：

Array
(
    [0] => Array
        (
            [0] => keyword
            [1] => keyword
            [2] => 'keyword 1 and keyword 2'
            [3] => another
            [4] => 'one more'
            [5] => "another keyword"
            [6] => yes
            [7] => one
            [8] => two
        )

)

另外，我认为我的正则表达式有点笨拙，所以任何更好的建议都会很好:)

任何建议/帮助将不胜感激。

【问题讨论】：

a,"b",c,d,"e" 或 "b'" '"c' 之类的呢？

标签： php regex preg-match-all

【解决方案1】：

你几乎得到它;您只需要使用环视来匹配引号：

'/(?<=\')[^\'\s][^\']*+(?=\')|(?<=")[^"\s][^"]*+(?=")|[^\'",\s]+/'

【讨论】：

太棒了！！！！这正是我所需要的！非常感谢 Alan M。一直在尝试理解您使用的正则表达式，并且它开始变得有意义。老实说，我以前从未遇到过 = 。再次感谢，真的很感激
您可能想阅读以下内容：regular-expressions.info/lookaround.html 整个网站都很棒。

【解决方案2】：

preg_match_all('/"([^"]+)"|[^"\' ,]+|\'([^\']+)\'/',$str,$matches);

并使用$matches[1] 和$matches[2]。

【讨论】：

它需要是： preg_match_all('/"([^"]+)"|([^"\' ,]+)|\'([^\']+)\ '/',$str,$matches);并使用 $matches[1]、$matches[2] 和 $matches[3] 在 preg_match_all 函数之后再次需要更多操作，因此 array_map 一个 str_replace 函数会更容易合并数组的活动实例成一个数组。
您如何建议您合并不同的结果数组？
没有原生的整理数组合并功能，所以我想写一个。我不完全了解您的输出要求是什么，所以很难说什么是最合适的。

【解决方案3】：

这需要一个简单的函数来获得你想要的，但它确实有效

preg_match_all('/"([^"]+)"|([^"\' ,]+)|\'([^\']+)\'/',$str,$matches);
function r($str) {
    return str_replace(array('\'','"'), array(''), $str);
}
$a = array_map('r', $matches[0]);
print_r($a);

【讨论】：

谢谢，我已经对此进行了调查，但会造成不必要的工作量。感谢您通过 Galen 提出的意见

【解决方案4】：

看看this tokenizeQuote function在cmets到strtok function。

编辑您需要修改函数，因为原来的函数只能使用双引号：

function tokenizeQuoted($string)
{
    for ($tokens=array(), $nextToken=strtok($string, ' '); $nextToken!==false; $nextToken=strtok(' ')) {
        $firstChar = $nextToken{0};
        if ($firstChar === '"' || $firstChar === "'") {
            $nextToken = $nextToken{strlen($nextToken)-1} === $firstChar
                ? substr($nextToken, 1, -1)
                : substr($nextToken, 1) . ' ' . strtok($firstChar);
        }
        $tokens[] = $nextToken;
    }
    return $tokens;
}

编辑也许您应该编写自己的解析器：

$tokens = array();
$buffer = '';
$quote = null;
$len = strlen($str);
for ($i=0; $i<$len; $i++) {
    $char = $str{$i};
    if ($char === '"' || $char === "'") {
        if ($quote === null) {
            if ($buffer !== '') {
                $tokens[] = $buffer;
                $buffer = '';
            }
            $quote = $char;
            continue;
        }
        if ($quote == $char) {
            $tokens[] = $buffer;
            $buffer = '';
            $quote = null;
            continue;
        }
    } else if ($char === ',' || $char === ' ') {
        if ($quote === null) {
            if ($buffer !== '') {
                $tokens[] = $buffer;
                $buffer = '';
            }
            continue;
        }
    }
    $buffer .= $char;
}
if ($buffer !== '') {
    $tokens[] = $buffer;
}

【讨论】：

不是我想要的，因为我想用 preg_match_all 完成它，但是谢谢。（该函数也不适用于单引号）
但是，它并没有像我的正则表达式那样考虑逗号，只考虑空格。我确信最好的方法是使用 preg_match_all，但如果不能做到，那我就换一个替代品。