【发布时间】:2014-10-27 07:04:18
【问题描述】:
我想编写一个正则表达式来忽略包含来自 youtube、vimeo 或 soundcloud 的 URL 的 iframe,这些 URL 是用 HTML 实体编码的字符串。
这是我尝试过的,但不起作用。下面给出了一些示例文本
正则表达式
<iframe(^?youtube|soundcloud|vimeo)*\/iframe
示例文本
<p><iframe src="http://www.3you3tube.com/embed/YoX1yc92MOU" width="500" height="300" frameborder="0" scrolling="auto"></iframe></p>
29 <p>text daily to place domain staff as volunteers with charity partners, we know all too well that the "V" word can sometimes be misunderstood. Occasionally seen as a dusty, worthy word, it can conjure images of coffee mornings and bric-a-brac stalls. So its not always as easy as you might think to get people to embrace their inner-volunteer. That's why the <a href="http://www.domain.co.uk/sdfn/2010/11/connect-create-domain-volunteers.shtml">Conne
样本输出
<iframe src="http://www.3you3tube.com/embed/YoX1yc92MOU" width="500" height="300" frameborder="0" scrolling="auto"></iframe>
示例文本
<p><iframe src="http://www.youtube.com/embed/YoX1yc92MOU" width="500" height="300" frameborder="0" scrolling="auto"></iframe></p>
29 <p>text daily to place domain staff as volunteers with charity partners, we know all too well that the "V" word can sometimes be misunderstood. Occasionally seen as a dusty, worthy word, it can conjure images of coffee mornings and bric-a-brac stalls. So its not always as easy as you might think to get people to embrace their inner-volunteer. That's why the <a href="http://www.domain.co.uk/sdfn/2010/11/connect-create-domain-volunteers.shtml">Conne
样本输出
nil
要明确一点:
我想忽略其中包含 youtube、vimeo 或 soundcloud 的 iframe。
我正在用 rubular 测试它 http://rubular.com/r/F9x6SSkIfu
【问题讨论】:
-
这不是正则表达式的好用法。 HTML 变化太大,无法处理一个模式。相反,将实体解码回 HTML,然后使用解析器,例如 Nokogiri,它将规范化 HTML,从而很容易忽略顺序、空格、大写等方面的差异。
-
我试过你提到的解决方案,看起来数据不是很一致。有几个损坏的标签导致 nokogiri 无法正确解析 HTML 字符串。一个例子是这个问题:stackoverflow.com/questions/25596881/…