如何优化 preg_match_all 或其他替代方案？答案

【问题标题】：How to optimize preg_match_all or other alternative?如何优化 preg_match_all 或其他替代方案？
【发布时间】：2012-12-27 13:52:28
【问题描述】：

我有这个代码：

function toDataUri( $html )
{
  # convert css URLs to data URIs
  $html = preg_replace_callback( "#(url\([\'\"]?)([^\"\'\)]+)([\"\']?\))#", 'create_data_uri', $html );
  return $html;
}

// callback function
private function create_data_uri( $matches )
{
  $filetype = explode( '.', $matches[ 2 ] );
  $filetype = trim(strtolower( $filetype[ count( $filetype ) - 1 ] ));

  // replace ?whatever=value from extensions
  $filetype = preg_replace('#\?.*#', '', $filetype);

  $datauri = $matches[ 2 ];
  $data =  get_file_contents( $datauri );

  if (! $data) return $matches[ 0 ];

  $data = base64_encode( $data );

  //compile and return a data: URI with the encoded image data
  return $matches[ 1 ] . "data:image/$filetype;base64,$data" . $matches[ 3 ];
}

它基本上在 HTML 文件中搜索格式为 url(path) 的 URL，并用 base 64 Data URIS 替换它们。

问题是，如果输入的 html 是几公斤，例如 10kb，则返回最终响应需要很长时间。在这种情况下我们可以做任何优化吗？或者您有任何其他解决方案，当给定 html 时，它会搜索 url(path) 匹配并将它们转换为数据 uris？

【问题讨论】：

问题中的代码仍然引用了很多问题中没有的代码。你的代码中有get_file_contents（或者看起来你可能有），这将到目前为止比 preg_match 调用花费更长的时间。您应该分析您的代码，以便您知道时间在哪里，而不是假设它是 preg_match 调用（不太可能）

标签： php html performance optimization

【解决方案1】：

表达式已经很便宜了——从一个固定的字符串开始，不需要回溯。

在 PCRE 中有 S 修饰符可以启用一些正则表达式优化，但它只对没有固定前缀的模式很重要。

它不应该很慢——对于像这样的简单正则表达式来说，10KB 并不多。也许瓶颈在其他地方？

如果你在解析的文件中没有关闭url(，并且文件末尾没有)，那么它会扫描更多。 [^\"\'\)]{0,1000} 会限制这一点。但这是一个小的优化，只有在文件中存在病态语法错误时才会产生影响。
您可以删除整个表达式周围的()。第 0 次匹配总是捕获整个字符串。

【讨论】：

你能详细说明一下吗？代码应该做哪些修改？