nokogiri 选择文本匹配的段落答案

【问题标题】：nokogiri select paragraph with text matchnokogiri 选择文本匹配的段落
【发布时间】：2016-05-16 19:19:07
【问题描述】：

所以我写了一个刮板，我试图只获取包含 On Snow Feel

的段落文本

我正在尝试将其拉出，但我不确定如何让 nokogiri 拉出具有匹配文本的段落。

目前我有boards[:onthesnowfeel] = html.css(".reviewfold p").text，但这包含了所有段落。并且不要假设这些段落会一直井井有条。所以不能只做 [2] 什么的。

但是你会用什么方法来刮掉与文本“On Snow Feel”匹配的段落

<div id="review" class="reviewfold">
<p>The <strong>Salomon A</strong><b>assassin</b>&nbsp;Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. </p>
<p><b>Approximate Weight</b>: Moew mix is pretty normal</p>
<p><strong>On Snow Feel:&nbsp;</strong>At vero eos et accusamus et iusto odio dignissimos ducimus qui blanditiis praesentium voluptatum.</p>
<p><strong>Powder:&nbsp;</strong>It is a long established fact that a reader will be distracted by the readable content of a page when looking at its layout. </p>
</div>

【问题讨论】：

试试html.css(".reviewfold p").find { |e| e.text =~ /On Snow Feel/ }.text。
成功了！ @sschmeck
见stackoverflow.com/questions/1474688/…。请注意，如果您想在段落的开头匹配文本，则必须使用 XPath：doc.xpath("//*[@class='reviewfold']//p[starts-with(.,'On Snow Feel')]")

标签： ruby nokogiri scraper open-uri

【解决方案1】：

您可以将Enumerable#find 与正则表达式匹配=~ 结合使用以获得所需的元素内容。

html.css(".reviewfold p").find { |e| e.text =~ /On Snow Feel/ }.text

【讨论】：