在这种情况下,我认为使用str_replace 更简单(尽管它并不完美)。
假设您有一系列要突出显示的术语,我将其称为 $aSearchTerms 以进行论证...并且将突出显示的术语包装在 HTML5 <mark> 标记中是可以接受的(对于为了清晰起见,您已经声明它在网页上,很容易从您的搜索词中strip_tags()):
$aSearchTerms = ['Jan', 'anu', 'Feb', '11'];
$sinContent = "My daughter was born on January 11, 2011.";
foreach($aSearchTerms as $sinTerm) {
$sinContent = str_replace($sinTerm, "<mark>{$sinTerm}</mark>", $sinContent);
}
echo $sinContent;
// outputs: My d<mark>au</mark>ghter was born on <mark>Jan</mark>uary <mark>11</mark>, 20<mark>11</mark>.
这并不完美,因为使用该数组中的数据,第一遍会将January 更改为<mark>Jan</mark>uary,这意味着anu 将不再匹配January - 然而,这样的东西将满足大多数的使用需求。
编辑
Oki - 我不能 100% 确定这是正常的,但我采取了完全不同的方法查看@AlexAtNet 发布的链接:
https://stackoverflow.com/a/3631016/886824
我所做的是查看字符串中以数字方式找到搜索词的点(索引),并构建了一个开始和结束索引的数组,<mark> 和 </mark> 标记将是输入。
然后使用上面的答案将这些开始和结束索引合并在一起 - 这涵盖了您的重叠匹配问题。
然后我循环该数组并将原始字符串切割成子字符串并将其粘合在一起,在相关点插入<mark> 和</mark> 标签(基于索引)。这应该涵盖您的第二个问题,因此您不会用字符串替换替换字符串。
完整的代码如下:
<?php
$sContent = "Captain's log, January 11, 2711 - Uranus";
$ainSearchTerms = array('Jan', 'asduih', 'anu', '11');
//lower-case it for substr_count
$sContentForSearching = strtolower($sContent);
//array of first and last positions of the terms within the string
$aTermPositions = array();
//loop through your search terms and build a multi-dimensional array
//of start and end indexes for each term
foreach($ainSearchTerms as $sinTerm) {
//lower-case the search term
$sinTermLower = strtolower($sinTerm);
$iTermPosition = 0;
$iTermLength = strlen($sinTermLower);
$iTermOccursCount = substr_count($sContentForSearching, $sinTermLower);
for($i=0; $i<$iTermOccursCount; $i++) {
//find the start and end positions for this term
$iStartIndex = strpos($sContentForSearching, $sinTermLower, $iTermPosition);
$iEndIndex = $iStartIndex + $iTermLength;
$aTermPositions[] = array($iStartIndex, $iEndIndex);
//update the term position
$iTermPosition = $iEndIndex + $i;
}
}
//taken directly from this answer https://stackoverflow.com/a/3631016/886824
//just replaced $data with $aTermPositions
//this sorts out the overlaps so that 'Jan' and 'anu' will merge into 'Janu'
//in January - whilst still matching 'anu' in Uranus
//
//This conveniently sorts all your start and end indexes in ascending order
usort($aTermPositions, function($a, $b)
{
return $a[0] - $b[0];
});
$n = 0; $len = count($aTermPositions);
for ($i = 1; $i < $len; ++$i)
{
if ($aTermPositions[$i][0] > $aTermPositions[$n][1] + 1)
$n = $i;
else
{
if ($aTermPositions[$n][1] < $aTermPositions[$i][1])
$aTermPositions[$n][1] = $aTermPositions[$i][1];
unset($aTermPositions[$i]);
}
}
$aTermPositions = array_values($aTermPositions);
//finally chop your original string into the bits
//where you want to insert <mark> and </mark>
if($aTermPositions) {
$iLastContentChunkIndex = 0;
$soutContent = "";
foreach($aTermPositions as $aChunkIndex) {
$soutContent .= substr($sContent, $iLastContentChunkIndex, $aChunkIndex[0] - $iLastContentChunkIndex)
. "<mark>" . substr($sContent, $aChunkIndex[0], $aChunkIndex[1] - $aChunkIndex[0]) . "</mark>";
$iLastContentChunkIndex = $aChunkIndex[1];
}
//... and the bit on the end
$soutContent .= substr($sContent, $iLastContentChunkIndex);
}
//this *should* output the following:
//Captain's log, <mark>Janu</mark>ary <mark>11</mark>, 27<mark>11</mark> - Ur<mark>anu</mark>s
echo $soutContent;
不可避免的问题!
在已经是 HTML 的内容上使用它可能会失败。
给定字符串。
In <a href="#">January</a> this year...
Jan 的搜索/标记将在“Jan”周围插入<mark>/</mark>,这很好。但是,In Jan 之类的搜索标记将失败,因为有标记:\
恐怕想不出好办法。