【问题标题】:Nokogiri parsing with xpath returns empty string使用 xpath 解析 Nokogiri 返回空字符串
【发布时间】:2014-10-01 12:07:22
【问题描述】:

我有以下 HTML:

<div>
 <table>
  <tr>
   <td>

    <div class="w135">

     <div style="float: left; padding-right: 10px;" class="imageThumbnail playerDiv">
      <a href="/sport/tennis/2014/10/djokovic-through-wozniacki-out-china-open-2014101114115427766.html" id="ctl00_ctl00_DataList1_ctl00_Thumbnail1_lnkImage10" target="_parent">
       <img src="/mritems/imagecache/89/135/mritems/images/2014/10/1/2014101114447491734_20.jpg" id="ctl00_ctl00_DataList1_ctl00_Thumbnail1_imgSmall10" border="0" class="imageThumbnail">
            </a>
     </div>


    </div>
   </td>
  </tr>
 </table>
</div>

当我尝试 rake 时,我得到了错误:

NoMethodError: undefined method `at_css' for ["id","ctl00_cphBody_ctl01_DataList1_ctl00_Thumbnail1_Layout17"]:Array

这是代码:

@request = HTTParty.get(url)

@html = Nokogiri::HTML(@request.body)

@html.css(".w135")[0].map do |item|

    url = item.at_css("div.playerDiv a")

    puts url.inspect
end   

我真的不确定问题是什么,并且已经尝试解决了一段时间。此行出现错误url = item.at_css("div.playerDiv a")

欢迎提出任何建议!

谢谢

【问题讨论】:

  • 令人讨厌的是,在刚刚发布之后,我通过将范围从 [0] 更改为实际范围来解决这个问题,例如[0..1]。愚蠢的错误。谢谢。

标签: ruby-on-rails-4 xpath nokogiri httparty


【解决方案1】:

我会使用类似的方法:

require 'nokogiri'

doc = Nokogiri::HTML(<<EOT)
<div>
 <table>
  <tr>
   <td>

    <div class="w135">

     <div style="float: left; padding-right: 10px;" class="imageThumbnail playerDiv">
      <a href="/sport/tennis/2014/10/djokovic-through-wozniacki-out-china-open-2014101114115427766.html" id="ctl00_ctl00_DataList1_ctl00_Thumbnail1_lnkImage10" target="_parent">
       <img src="/mritems/imagecache/89/135/mritems/images/2014/10/1/2014101114447491734_20.jpg" id="ctl00_ctl00_DataList1_ctl00_Thumbnail1_imgSmall10" border="0" class="imageThumbnail">
            </a>
     </div>


    </div>
   </td>
  </tr>
 </table>
</div>
EOT

puts doc.search('.w135 div.playerDiv a').map(&:inspect)

哪些输出:

# >> #<Nokogiri::XML::Element:0x3ff0918b132c name="a" attributes=[#<Nokogiri::XML::Attr:0x3ff0918b1250 name="href" value="/sport/tennis/2014/10/djokovic-through-wozniacki-out-china-open-2014101114115427766.html">, #<Nokogiri::XML::Attr:0x3ff0918b123c name="id" value="ctl00_ctl00_DataList1_ctl00_Thumbnail1_lnkImage10">, #<Nokogiri::XML::Attr:0x3ff0918b1228 name="target" value="_parent">] children=[#<Nokogiri::XML::Text:0x3ff0918a5b6c "\n       ">, #<Nokogiri::XML::Element:0x3ff0918a5360 name="img" attributes=[#<Nokogiri::XML::Attr:0x3ff0918a4d20 name="src" value="/mritems/imagecache/89/135/mritems/images/2014/10/1/2014101114447491734_20.jpg">, #<Nokogiri::XML::Attr:0x3ff0918a4cbc name="id" value="ctl00_ctl00_DataList1_ctl00_Thumbnail1_imgSmall10">, #<Nokogiri::XML::Attr:0x3ff0918a4b90 name="border" value="0">, #<Nokogiri::XML::Attr:0x3ff0918a4a28 name="class" value="imageThumbnail">]>, #<Nokogiri::XML::Text:0x3ff091871920 "\n            ">]>

如果您尝试访问“href”参数,而不是使用inspect,请使用:

puts doc.search('.w135 div.playerDiv a').map{ |n| n['href'] }
# >> /sport/tennis/2014/10/djokovic-through-wozniacki-out-china-open-2014101114115427766.html

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2021-05-04
    • 1970-01-01
    • 2020-07-10
    • 2013-03-21
    • 2013-03-26
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多