如何用段落标签包围所有文本片段？ [关闭]答案

【问题标题】：How do I surround all text pieces with paragraph tags? [closed]如何用段落标签包围所有文本片段？ [关闭]
【发布时间】：2011-08-23 02:30:15
【问题描述】：

我想在任何文本项周围放置段落标签。因此，它应该避免使用表格和其他元素。我怎么做？我想它可以用preg_replace制作吗？

【问题讨论】：

请更具体一些...我不确定你想要什么。
据我了解，您不想选择任何元素，而只想选择文本并在其周围放置一个段落标签。所以这个正则表达式^[\w\d]+$ 会找到你所有的文本，并且会忽略任何元素，比如表格或其他任何元素。
@Lukas Knuth 我想知道我如何使用 php 将段落标签添加到仅带有换行符的文本中。使用 html
标签将文本分成单独的段落。
@Abhishek Simon 很有趣。如何使用该正则表达式来实现我想要的功能？
也许 'nl2br' 功能适合您。但它创建的是
，而不是
。

标签： php html regex preg-replace paragraph

【解决方案1】：

由于使用正则表达式很难知道标签内的正则表达式，我建议使用 DOM 解析器并处理生成的 DOM 对象：

$doc = new DOMDocument();
$doc->loadHTML("<body>Test<br><p>Test 2</p>Test 3</body>");
$content = $doc->documentElement->getElementsByTagName('body')[0]->childNodes;
for($i = 0; $i < $content->length; $i++) {
    $node = $content->item($i);
    if ($node->nodeType === XML_TEXT_NODE) { // '#text'
        $element = $doc->createElement('p');
        $node->parentNode->replaceChild($element, $node);
        $element->appendChild($node);
    }
}

【讨论】：

【解决方案2】：

这里有几个函数可以帮助你做你想做的事：

// nl2p
// This function will convert newlines to HTML paragraphs
// without paying attention to HTML tags. Feed it a raw string and it will
// simply return that string sectioned into HTML paragraphs

function nl2p($str) {
    $arr=explode("\n",$str);
    $out='';

    for($i=0;$i<count($arr);$i++) {
        if(strlen(trim($arr[$i]))>0)
            $out.='<p>'.trim($arr[$i]).'</p>';
    }
    return $out;
}



// nl2p_html
// This function will add paragraph tags around textual content of an HTML file, leaving
// the HTML itself intact
// This function assumes that the HTML syntax is correct and that the '<' and '>' characters
// are not used in any of the values for any tag attributes. If these assumptions are not met,
// mass paragraph chaos may ensue. Be safe.

function nl2p_html($str) {

    // If we find the end of an HTML header, assume that this is part of a standard HTML file. Cut off everything including the
    // end of the head and save it in our output string, then trim the head off of the input. This is mostly because we don't
    // want to surrount anything like the HTML title tag or any style or script code in paragraph tags. 
    if(strpos($str,'</head>')!==false) {
        $out=substr($str,0,strpos($str,'</head>')+7);
        $str=substr($str,strpos($str,'</head>')+7);
    }

    // First, we explode the input string based on wherever we find HTML tags, which start with '<'
    $arr=explode('<',$str);

    // Next, we loop through the array that is broken into HTML tags and look for textual content, or
    // anything after the >
    for($i=0;$i<count($arr);$i++) {
        if(strlen(trim($arr[$i]))>0) {
            // Add the '<' back on since it became collateral damage in our explosion as well as the rest of the tag
            $html='<'.substr($arr[$i],0,strpos($arr[$i],'>')+1);

            // Take the portion of the string after the end of the tag and explode that by newline. Since this is after
            // the end of the HTML tag, this must be textual content.
            $sub_arr=explode("\n",substr($arr[$i],strpos($arr[$i],'>')+1));

            // Initialize the output string for this next loop
            $paragraph_text='';

            // Loop through this new array and add paragraph tags (<p>...</p>) around any element that isn't empty
            for($j=0;$j<count($sub_arr);$j++) {
                if(strlen(trim($sub_arr[$j]))>0)
                    $paragraph_text.='<p>'.trim($sub_arr[$j]).'</p>';
            }

            // Put the text back onto the end of the HTML tag and put it in our output string
            $out.=$html.$paragraph_text;
        }

    }

    // Throw it back into our program
    return $out;
}

其中的第一个，nl2p()，将字符串作为输入，并将其转换为数组，只要有换行符 ("\n") 字符。然后它遍历每个元素，如果找到一个不为空的元素，它将在其周围包裹 标签并将其添加到一个字符串中，该字符串在函数末尾返回。

第二个，nl2p_html()，是前者的更复杂的版本。将 HTML 文件的内容作为字符串传递给它，它会将  和  标签包裹在任何非 HTML 文本周围。它通过将一个字符串分解成一个数组，其中分隔符是< 字符，它是任何HTML 标记的开始。然后，遍历这些元素中的每一个，代码将查找 HTML 标记的结尾，并将其后面的任何内容放入一个新字符串中。这个新字符串本身将被分解成一个数组，其中分隔符是换行符 ("\n")。循环遍历这个新数组，代码查找非空元素。当它找到一些数据时，它会将其包装在段落标签中并将其添加到输出字符串中。当这个循环完成时，这个字符串将被添加回 HTML 代码中，并且这将一起被修改为一个输出缓冲区字符串，一旦函数完成就会返回。

tl;dr：nl2p() 会将字符串转换为 HTML 段落，而不会留下任何空段落，并且 nl2p_html() 会将段落标签包裹在 HTML 文档正文的内容周围。

我在几个小的示例 HTML 文件上对此进行了测试，以确保间距和其他内容不会破坏输出。由 nl2p_html() 生成的代码也可能不符合 W3C 标准，因为它会将锚点包裹在段落等周围，而不是相反。

希望这会有所帮助。

【讨论】：

这看起来很棒，但出于某种原因，有时它会跳过段落的第一个字符并添加开始标记。示例：<The weekend is nearly here, let's celebrate with some brand new music 关于为什么会发生这种情况和/或如何解决它的任何想法？