解析后在 <pre> 的文本中保留回车符答案

【问题标题】：Keep carriage returns in text from <pre> after parsing解析后在 <pre> 的文本中保留回车符
【发布时间】：2014-08-12 02:23:39
【问题描述】：

我正在为这个函数使用 Simple html dom (http://simplehtmldom.sourceforge.net/) 库。

我想解析一个网站的 pre 标记的内容，我正在使用这个代码：

    <?php include '/libraries/simple_html_dom.php' ?>
    <?php
    // Create DOM from URL or file
    $html = file_get_html('testing.html');

     // Find the Text
    foreach($html->find('pre') as $element) 
           echo '<p>' . $element . '<p>';      
    ?>

这是文件“testing.html”的内容：

    <html>
    <head>
    </head>
    <body bgcolor="#FFFFFF">
    <pre>
    am.o                 V      1 1 PRES ACTIVE  IND 1 S    
    amo, amare, amavi, amatus  V   [XXXAO]  
    love, like; fall in love with; be fond of; have a tendency to;
    am.as                N      1 1 ACC P F                 
    ama, amae  N  F   [XXXDO]    lesser
    bucket; water bucket; (esp. fireman's bucket);
    am.as                V      1 1 PRES ACTIVE  IND 2 S    
    amo, amare, amavi, amatus  V   [XXXAO]  
    love, like; fall in love with; be fond of; have a tendency to;
    </pre>
    </body>
    </html>

如您所见，前置文本具有回车符，我想将其保留在输出中。目前这是解析器的输出：

  am.o                 V      1 1 PRES ACTIVE  IND 1 S      amo, amare, amavi, amatus  V   [XXXAO]    love, like; fall in love with; be fond of; have a tendency to;  am.as                N      1 1 ACC P F                   ama, amae  N  F   [XXXDO]    lesser  bucket; water bucket; (esp. fireman's bucket);  am.as                V      1 1 PRES ACTIVE  IND 2 S      amo, amare, amavi, amatus  V   [XXXAO]    love, like; fall in love with; be fond of; have a tendency to;

我该怎么做？

【问题讨论】：

试试nl2br()
你不能指望用简单的 html dom 保留空白。如果需要，请使用 preg 函数。

标签： php html-parsing simple-html-dom

【解决方案1】：

使用echo '' . $element->innerHTML . '';

【讨论】：

【解决方案2】：

用 BR 标签替换换行符。您可以为此使用nl2br()。

【讨论】：

有没有简单的方法来实现这个？
是的，echo '' . nl2br($element) . '';
试试echo '' . str_replace(array("\r\n", "\r", "\n"), ' ', $element) . '';
对不起 - 不走运！ @LukePeterson

【解决方案3】：

您必须指定文本节点：

foreach($html->find('pre') as $element) 
           echo '<p>' . $element->innertext . '<p>';

【讨论】：

@JBithell 你可以试试print_r($element);
输出超过 30,000 个字符：ivate] => simple_html_dom Object RECURSION ) ) [parent] => [_] => Array ( [0] => - 1 [1] => 12 ) [tag_start] => 0 [dom:simple_html_dom_node:private] => simple_html_dom 对象 RECURSION ) [1] => simple_html_dom_node 对象 ( [nodetype] => 1 [tag ] => html [attr] => Array () [children] => Array ( [0] => simple_html_dom_node Object ( [nodetype] => 1 [tag] => he
@JBithell 你能在输出中的某个地方找到目标文本吗？
是的 - 但仍然在一行@Amir
@JBithell 一行是什么意思

【解决方案4】：

原来它真的很简单！不需要简单的 HTML Dom，因为它可以在没有这样的库的情况下完成：

$file = file_get_contents('testing.html');
$start = '<html>';
$end   = '<pre>';
$string = $file;
$whatwearelookingfor = strstr( substr( $string, strpos( $string, $start) + strlen( $start)), $end, true);
$parsedresult = str_replace($whatwearelookingfor,"",$file);
$parsedresult = str_replace("<html>","",$parsedresult);
$parsedresult = str_replace("</body></html>","",$parsedresult);
echo $parsedresult;

它返回pre-preserving返回的内容！

【讨论】：