【问题标题】:Convert Text within Tables to Plain text with linebreaks将表格中的文本转换为带有换行符的纯文本
【发布时间】:2012-09-28 01:51:34
【问题描述】:

鉴于一大块 HTML 可以很好地在 <div><table> 中显示数据,如何删除所有 HTML/CSS 标记,同时保留最初在单个单元格和 div 中找到的文本,现在只用换行符分隔?

此处显示的当前尝试将输出一个长的连续段落,而不是在 div 或表格形式时保持分隔。

原始 HTML: http://pastebin.com/63N3Kg16

输出:

John Smith | SomeName Realty | (xxx) 939-4835 Allston St, Cambridge, MA Very spacious under renovation with SST/Granite, porch, minutes to MIT, redline, Nov/1 4BR/1BA Apartment $3,400/month Bedrooms 4 Bathrooms 1 full, 0 partial Sq Footage Unspecified Parking None Pet Policy No pets Deposit $0 DESCRIPTION Triple decker building secondfloor apt aprox 2000 sqf with large bedrooms, kitchen, pantry, porch, d/w, all woodfloor and ZTilded in the kitchen, new bath. utilities extra,Nov/1 see additional photos below Contact info: Payman Ahmadifar Bayside Realty (xxx) 939-4835 Posted: Sep 24, 2012, 6:55am PDT

PHP

nl2br(trim(strip_tags($html)));

预期输出

带有<br> 或换行符的纯文本,没有<div><table> HTML 标记。基本上是为了使文本更具可读性,保持原始的间距/分隔结构,但除了 <br> 之外没有 CSS 样式或 HTML 标记。

John Smith | SomeName Realty | (xxx) 939-4835 

Allston St, Cambridge, MA 

Very spacious under renovation with SST/Granite, porch, minutes to MIT, redline, Nov/1 

4BR/1BA Apartment $3,400/month 

Bedrooms 4 
Bathrooms 1 full, 0 partial 
Sq Footage Unspecified 
Parking None 
Pet Policy No pets 
Deposit $0 

DESCRIPTION 
Triple decker building secondfloor apt aprox 2000 sqf with large bedrooms, kitchen, pantry, porch, d/w, all woodfloor and ZTilded in the kitchen, new bath. utilities extra,Nov/1 see additional photos below 

Contact info: Payman Ahmadifar Bayside Realty (xxx) 939-4835 
Posted: Sep 24, 2012, 6:55am PDT

【问题讨论】:

  • 你试过strip_tags($html, '

    ')
  • 你能添加预期的输出吗
  • 另外 nl2br 也不一定能达到预期的效果,因为 html 可能不包含任何 nl
  • 感谢您的反馈,我不希望在最终输出中出现像 <table> <p> <div> 这样的样式。如果可能,只使用新行,<br><strong>
  • John Smith 来自哪里?以及在哪里 |来自?你想在浏览器中查看吗?还是保存到某个文件中?

标签: php simple-html-dom


【解决方案1】:

你可以玩一些字符串操作

试试

$string = strip_tags($html);
$string = str_replace(chr(32).chr(32).chr(32),"*****",$string);
$newString = array_map(function($var){ return  trim(preg_replace('!\s+!', ' ',$var)); },explode("*****",$string));
print(implode("\n", $newString));

See Live Demo

【讨论】:

  • 输出看起来很漂亮!可能还需要 1 个函数来将多个 \n 替换为单个 \n***** 只是临时占位符/分隔符吗?
  • @Nyxynyx 相信您对自己想要的东西有更好的了解......只是在玩耍
猜你喜欢
  • 2012-03-09
  • 2022-06-23
  • 1970-01-01
  • 1970-01-01
  • 2011-10-04
  • 1970-01-01
  • 2012-04-12
  • 2014-09-08
  • 1970-01-01
相关资源
最近更新 更多