使用 file_get_contents() 从另一个网站提取的数据上的 Split_on_title() 和 preg_replace()答案

【问题标题】：Split_on_title() and preg_replace() on a data pulled from another website with file_get_contents()使用 file_get_contents() 从另一个网站提取的数据上的 Split_on_title() 和 preg_replace()
【发布时间】：2015-12-17 01:37:40
【问题描述】：

我已经使用 file_get_contents() 从另一个网站提取数据。

这是源代码的一部分：

<font style="font-size:10px;color:#123333;font-weight:BOLD;">1,22 €</font>

我使用 split_on_title 函数从字符串中提取 1,22 €：

$split_on_title = preg_split("<font style=\"font-size:10px;color:#123333;font-weight:BOLD;\">", $source);
$split_on_endtitle = preg_split("</font>", $split_on_title[1]);
$title = $split_on_endtitle[0];

当我回显 $title 时，firefox 返回：

>1,22 â‚¬<

我在字符串上使用了 preg_replace：

preg_replace('> â‚¬<', '', $title);

然后，php 显示此错误：Warning: preg_replace(): No ending delimiter '>' found in ....

我怎样才能获得 1,22 欧元的净值？至少只有1,22。提前致谢。

编辑：

了解我提供的数据很难。我会写一个更大的数据；

<tr>
    <td width="80" align="left" valign="top">
        <b> Price:</b>
    </td>
    <td align="left"  valign="top">
        <font style="font-size:10px;color:#123333;font-weight:BOLD;">1,22 €</font>
    </td>
</tr>

我需要帮助从这个来源提取 1,22 欧元。

【问题讨论】：

标签： php

【解决方案1】：

请在您的 html 页面的 <head> 部分添加对 UTF-8 的必要支持

<meta charset="UTF-8" />

它丢失了，因此欧元符号未正确呈现

有关如何放入此元标记和其他元标记的更多详细信息： http://www.w3schools.com/tags/tag_meta.asp

【讨论】：

好建议，但现在输出 >1,22 €
关于 preg_match 修复我认为@slapyo 在他的回答中做得正确；）

【解决方案2】：

为什么不使用preg_match 并抓取字体标签之间的所有内容？

$re = "/<font.*>(.*)<\\/font>/i"; 
$str = "<font style=\"font-size:10px;color:#123333;font-weight:BOLD;\">1,22 €</font>"; 

preg_match($re, $str, $matches);
echo $matches[1];

以下是模式的分解方式。

<font matches the characters <font literally (case insensitive)
.* matches any character (except newline)
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
> matches the characters > literally
1st Capturing group (.*)
.* matches any character (except newline)
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
< matches the characters < literally
\/ matches the character / literally
font> matches the characters font> literally (case insensitive)
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])

【讨论】：

我用硬编码（你写的代码）试过了，遇到了这个错误：注意：数组到字符串的转换在...我应该用$matches做什么
@OzanAtmar 输出它就像echo $matches[1];
@pavlovich 谢谢...我更新了示例以反映如何输出值。
实际上，当您在 $str 变量中写入该字符串时，它会起作用。但这是一个充满html标签的页面，前一个标签是“”。这在代码中重复了几次。所以不可能（或者我不知道）拉“1,22 €"从源代码中使用 preg_split()。
如果这是唯一有这个带有样式的特定字体标签的地方，那么你可以这样做$re = "/.*<font.*>(.*)<\\/font>.*/is";。但是，在不了解更多信息的情况下，这是一种不太理想的方法。