【发布时间】:2017-12-01 23:56:37
【问题描述】:
我从包含产品的数据库中获得了一个数组,我想将下面的大描述文本拆分/分隔为较小的产品属性名称和值块。最终,我正在努力实现数据库规范化,因为我目前正在尝试为 2 种不同的数据库设计创建一个导入工具。
我从旧产品表中得到的数组:
Array
(
[0] => Array
(
[product_id] => 219
[product_description] =>
<table style="color:; text-align: left;">
<tr>
<td>
Processor:
</td>
<td>
Intel Core 2 Duo - E8400
</td>
</tr>
<tr>
<td>
Clock speed:
</td>
<td>
3.0 GHz
</td>
</tr>
<tr>
<td>
Memory:
</td>
<td>
4 GB
</td>
</tr>
<tr>
<td>
Hard disk:
</td>
<td>
250 GB
</td>
</tr>
<tr>
<td>
Video-adapter:
</td>
<td>
VGA, Display
</td>
</tr>
<tr>
<td>
Netwerk card:
</td>
<td>
1000 Mbps LAN
</td>
</tr>
<tr>
<td>
Optical drive:
</td>
<td>
DVD-Rewriter
</td>
</tr>
<tr>
<td>
Operating system:
</td>
<td>
Windows 7 or 10 Pro
</td>
</tr>
<tr>
<td>
Warranty:
</td>
<td>
1 year
</td>
</tr>
</table>
)
)
到目前为止我的代码:
$sth = $dbh->prepare("SELECT * from products WHERE product_status_id = '1' ORDER BY order_num ASC");
$sth->execute();
$result = $sth->fetchAll(PDO::FETCH_ASSOC);
$output = array();
$tdpattern = "!<td>(.*?)</td>!is";
foreach ($result as $key=>$val) {
preg_match_all($tdpattern, $val['product_description'], $result);
foreach ($result as $key => $arr) {
foreach ($arr as $key2 => $description) {
$output[] = preg_replace('/\n^[\x0a\x20]+|[\x0a\x20]+$/','',$description);
}
}
}
// return $output to controller
如下所示,输出显示单词前面有多个空格,但它们之间没有空格,还有应该删除的换行符。除了每个数组元素的单词之间有 1 个空格之外,我怎样才能擦除所有这些控制字符,例如换行符和空格,所以理想情况下它就像底部的布局一样?
Array
(
[0] => Processor
[1] => IntelCore2-E5500
[2] => Clockspeed
[3] => 2.93GHz
[4] => Memory
[5] => 4GB
[6] => Harddisk
[7] => 250GB
[8] => Video-adapter
[9] => VGA,Display
[10] => Netwerkcard
[11] => 1000mbpsLAN
[12] => Opticaldrive
[13] => DVD-Rewriter
[14] => Operatingsystem
[15] => Windows7or10Pro
[16] => Warranty
[17] => 2jaar
)
我希望将其转换为这种布局:
[219] => array (
[product_description] => array (
[processor] => Intel Core 2 - E5500
[clock speed] => 2.93 GHz
[memory] => 2.93 GHz
[hard disk] => 2.93 GHz
[video adapter] => 2.93 GHz
[network card] => DVD Rewriter
[optical drive] => DVD Rewriter
[operating system] => Windows 7 or 10 Pro
[warranty] = > 2 years
)
)
一些方向会很棒,特别是如何改进正则表达式。
【问题讨论】:
-
Regx 是解析“HTML”的糟糕选择,您必须使用
preg_match_all我会为它制作一个标记器或使用类似PHPQuery -
这是我为
Json写的一个标记器/词法分析器,你可以为此做类似的事情。 github.com/ArtisticPhoenix/MISC/blob/master/JasonDecoder.php也许我会为你修改... -
这比删除空格更复杂。