【问题标题】:Improve my regex+php replacement改进我的 regex+php 替换
【发布时间】:2015-05-01 15:52:42
【问题描述】:

我正在尝试用正则表达式替换字符串的一部分。我的代码完成了这项工作,但它是正确的方式吗?

$string = 'blabla <!-- s:D --><img src="{SMILIES_PATH}/icon_biggrin.gif" alt=":D" title="Very Happy" /><!-- s:D --> blabla <!-- scat --><img src="{SMILIES_PATH}/cat2.gif" alt="cat" title="Cat" /><!-- scat --> blabla';
$pattern = '(<!-- s(\S*) --><img src="\S*" alt="\S*" title="[^"]+" \/><!-- s\S* -->)';

preg_match_all($pattern, $string, $result);

$i = 0;
foreach ($result[0] as $match) {
    $string = str_replace($match, $result[1][$i], $string);
    $i++;
}

我想要什么:blabla :D blabla cat blabla

正则表达式测试:https://regex101.com/r/fD0xI2/2

PHP 测试:http://ideone.com/mrS0BJ

【问题讨论】:

  • 你应该永远用正则表达式解析 HTML。请改用a PHP DOM parser
  • 不能使用外部库(一个文件代码)。我需要你的解析器怎么做?
  • 请发布所需输出的样本。
  • 已编辑。你可以在 PHP 测试链接上看到它

标签: php regex html-parsing


【解决方案1】:

我猜你可以减少正则表达式的大小,即:

$string = 'blabla <!-- s:D --><img src="{SMILIES_PATH}/icon_biggrin.gif" alt=":D" title="Very Happy" /><!-- s:D --> blabla <!-- scat --><img src="{SMILIES_PATH}/cat2.gif" alt="cat" title="Cat" /><!-- scat --> blabla';

preg_match_all('/(\S+) <!-- (.*?) -->/sm', $string , $matches, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($matches[1]); $i++) {
    $newString .= $matches[1][$i] ." ".$matches[2][$i]." " ;
}

echo $newString;

输出:

blabla s:D blabla scat 

演示:

http://ideone.com/9fons0


正则表达式扩展:

(\S+) <!-- (.*?) -->

Options: Dot matches line breaks; ^$ match at line breaks; Greedy quantifiers

Match the regex below and capture its match into backreference number 1 «(\S+)»
   Match a single character that is NOT a “whitespace character” «\S+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character string “ <!-- ” literally « <!-- »
Match the regex below and capture its match into backreference number 2 «(.*?)»
   Match any single character «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character string “ -->” literally « -->»

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2011-09-14
    • 1970-01-01
    • 1970-01-01
    • 2012-01-26
    • 2012-05-04
    相关资源
    最近更新 更多