使用 Ruby/Mechanize 在选定元素之后选择下一个元素答案

【问题标题】：Using Ruby/Mechanize to select next element after selected element使用 Ruby/Mechanize 在选定元素之后选择下一个元素
【发布时间】：2011-11-21 03:48:23
【问题描述】：

我无法专门找到这个问题，希望它是旧问题的新变体，我没有错。

我希望能够在（不一致的）p.red 元素 text() 之后选择表格，其中 'p' 不包含文本“Alphabetical”但包含文本“OVERALL”..

DOM 看起来像这样：

<p class=red>Some Text</p>
  <table class="newclass">
  <tr></tr>
  <tr></tr>
</table>

<p class=red>Some Text</p>
<table class="newclass">
  <tr></tr>
  <tr></tr>
</table>

<p class=red>OVERALL</p>
<table class="newclass">
  <tr></tr>
  <tr></tr>
</table>

表格在每一页的计数不同。

我想得到那个 p 标签的 text() ，但也想得到它之后的表格。同样，text() 包含“OVERALL”但不包含“ALPHABETICAL”.. 我应该构建一个数组并 .reject() 没有匹配的元素吗？目前我不确定，而且我对使用 Ruby 和 Mechanize 还很陌生，在此先感谢您的帮助！

【问题讨论】：

标签： ruby dom mechanize scraper

【解决方案1】：

使用 Nokogiri 的 CSS 评估既美观又干净：

require 'nokogiri'

doc = Nokogiri::HTML(<<EOT)
<p class=red>Some Text</p>
  <table class="newclass">
  <tr></tr>
  <tr></tr>
</table>

<p class=red>Some Text</p>
<table class="newclass">
  <tr></tr>
  <tr></tr>
</table>

<p class=red>OVERALL</p>
<table class="newclass">
  <tr></tr>
  <tr></tr>
</table>
EOT

puts doc.at('p:contains("OVERALL")').to_html
# >> <p class="red">OVERALL</p>

puts doc.at('p:contains("OVERALL") ~ table').to_html
# >> <table class="newclass">
# >> <tr></tr>
# >> <tr></tr>
# >> </table>

【讨论】：

【解决方案2】：

p 标签：

agent.parser.xpath('//p[.="OVERALL"]')[0]

后面的表格：

agent.parser.xpath('//p[.="OVERALL"]')[0].next.next

或：

agent.parser.xpath('//p[.="OVERALL"]/following-sibling::table[1]')[0]

【讨论】：

对于那些希望能够在 Mechanise 对象中找到下一个标签的人来说只是一个提示。像 agent = Mechanize.new 这样创建代理时的 parser.xpath。您需要添加
不小心提交了之前的评论，5分钟后无法更改。对于那些希望能够在 Mechanise 对象中找到下一个标签的人来说，这只是一个提示。 parser 是一种 Nokogiri 方法，因此在调用 class 时，您必须确保您的对象是 Nokogiri::XML::Element。如果您的代理创建为agent = Mechanize.newagent.parser.xpath 将不起作用（至少在 Mechanise 2.7.3 中）并将返回错误 NameError: undefined local variable or method parser' for main:Object. agent.page.parser.path ` 但是会起作用。
链接到与上一条评论相关的有用帖子stackoverflow.com/questions/23064821/…