查找标记文本并将其替换为代码标签的功能答案

【问题标题】：Function to find marked-up text and replace it with code tags查找标记文本并将其替换为代码标签的功能
【发布时间】：2014-10-23 16:07:03
【问题描述】：

我做了这个函数来查找文本字符串中的特定字符集并将它们转换为 html 标签：

function ccfc($content)
{
    $reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";

    // $code_block =  preg_replace($reg_exUrl, "<a href=".$url[0].">{$url[0]}</a> ", $content);
    if(preg_match($reg_exUrl, $content, $url)) {

           // make the urls hyper links
           $content = preg_replace($reg_exUrl, "<a href=".$url[0].">{$url[0]}</a> ", $content);

    } else {

           // if no urls in the text just return the text
           $content = $content;

        }



    $code_block = preg_replace_callback(
          '/([\`]{3})(.*?)([\`]{3})/s',
          function($matches) {
              $matches[2] = htmlentities($matches[2]);
              return '<pre><code>'. $matches[2] .'</code></pre>';
          },
          $content);

      $bold = preg_replace_callback(
                  '/([\*]{2})(.*?)([\*]{2})/s',
                  function($matches) {
                      $matches[2] = htmlentities($matches[2]);
                      return '<b>'. $matches[2] .'</b>';
                  },
                  $code_block);

      $italic = preg_replace_callback(
                  '/([\*]{1})(.*?)([\*]{1})/s',
                  function($matches) {
                      $matches[2] = htmlentities($matches[2]);
                      return '<i>'. $matches[2] .'</i>';
                  },
                  $bold);


    return $italic;

}

这个函数会找到像http://www.google.com这样的url并将它们转换为链接

第二个会找到```代码内容```并转换成<pre><code> code content </code></pre> 第三个会找到**内容**并转换为 content  第四个会找到*内容*并将其转换为 content  但是如果代码写在 ``` ``` 之外，它就会被执行。如何让剩余的文本使用 htmlentities()？

【问题讨论】：

 和  标签已弃用；您应该将文本放在 或其他标签中，并设置一个 css 类。
@ialarmedalien 或者使用 和 如果OP 试图传达强调，这似乎是他试图用降价转换来做

标签： php html function tags

【解决方案1】：

不要在通过转换器函数运行文本后调用htmlentities，而是在进行转换之前调用它：

function ccfc($content) {
    $content = htmlentities($content);

这不会影响标记中涉及的实体（* 和 `），您还可以将 double_encode 标志设置为 false 以确保已编码的内容（例如链接中的 & 字符) 不会被编码两次 -- see the PHP manual for the settings:

$content = htmlentities($content, ENT_QUOTES, UTF-8, false);

此设置会将文本视为 UTF-8，对所有引号进行编码，但不会对 http://example.com?p=1&amp;q=2 之类的链接进行双重编码。

另一方面，您不需要使用preg_replace_callback 进行替换；您可以在替换表达式中使用捕获的文本。以下是代码格式化正则表达式的示例：

$code_block = preg_replace(
      '/`{3}(.*?)`{3}/s',
      "<pre><code>$1</code></pre>",
      $content);

正如我在评论中指出的， 和  已被弃用；如果你用它们来强调文本，你可以分别用和替换它们；如果标记仅用于演示，最好将文本包含在  元素中，并为其指定一个具有粗体或斜体格式的类。

这里是完整的代码，其中 htmlentities 被移动和 preg_replace 替换：

function ccfc($content)
{   $content = htmlentities($content, ENT_QUOTES, NULL, false);

    echo $content . PHP_EOL;

    $reg_exUrl = "/((http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/?\S*)?)/";

    // $code_block =  preg_replace($reg_exUrl, "<a href=".$url[0].">{$url[0]}</a> ", $content);
    // make the urls hyperlinks
    $content = preg_replace($reg_exUrl, "<a href='$1'>$1</a>", $content);

    # replace ``` with code blocks
    $content = preg_replace(
        '/`{3}(.*?)`{3}/s',
        "<pre><code>$1</code></pre>",
        $content);

    # replace **text** with strong text
    $content = preg_replace(
            '/\*{2}([^\*].*?)\*{2}/s',
            "<strong>$1</strong>",
            $content);

    # replace *text* with em text
    $content = preg_replace(
              '/\*(.*?)\*/s',
             "<em>$1</em>",
              $content);

    return $content;
}

快速解释preg_replace 的工作原理：当您在正则表达式中使用括号时，您将这些括号内的内容捕获到特殊变量 $1、$2、$3 等。第一组括号的内容是在$1 中，第二组在$2 中的内容，以此类推。例如，以这个正则表达式为例：

/(\w+) and (\w+)/

并且输入字符串bread and butter、bread匹配第一组括号中的表达式，butter匹配第二组表达式； $1 将设置为 bread 和 $2 设置为黄油。这在我们使用preg_replace 时会变得很有用，因为我们可以在替换字符串中使用$1 和$2：

$str = preg_replace("/(\w+) and (\w+)/", "I love $2 on $1", "bread and butter");
echo $str;

输出：

I love butter on bread

匹配字符串中未捕获的任何内容都将消失，例如本例中的and。

在代码中的替换中，需要保留分隔符（* 和 `）之间的文本，因此将其捕获在括号中；分隔符本身是不需要的，所以它们不在括号中。

正则表达式中其他字符的解释：

?, *, +, {2} ：这些是量词 - 它们决定了前面的模式应该出现的次数。 ? 表示 0 次或 1 次； * 为 0 次或多次； + 是一次或多次； {2} 表示两次； {500} 表示 500 次。
\w代表任意数字、字母或_
. 匹配任意字符
.*?匹配任意长度的字符串，包括长度为0。
\** 将匹配 0 个或多个 * 字符；要匹配*，您必须对其进行转义（即\*），以便正则表达式引擎不会将其解释为量词

【讨论】：

关于文本，我将创建另一个类似这样的函数$code_block = preg_replace( '/([\*]{3})(.*?)([\*]{3})/s', "$2", $content); 而对于链接，它会处理更多链接，我只是尝试过，没关系
@DiarSelimi 如果您对正则表达式很聪明，您实际上可以使用现有代码处理***。尝试通过你的函数运行一些文本，看看会出现什么——输出会有问题，但你可以用一些正则表达式来修复它。
我想通了，我只是将斜体函数放在粗体之前并更改变量以返回粗体，这很好:)
@i alarmed alien 不，如果在 preg_replace 中要更改的内容是错误的，那么我不知道，因为我不太了解这个函数，也没有很好的解释, :(
@i 惊动了外星人，是的，我现在明白了，但是那些 w+ / {} 等是什么意思？