【问题标题】:How do I extract data from Nokogiri XML Document? Trying to use XPath unsuccessfully如何从 Nokogiri XML 文档中提取数据?尝试使用 XPath 失败
【发布时间】:2013-12-27 06:59:18
【问题描述】:

我在 Rails 应用程序中使用 Vacuum gem 从 Amazon 的 Product API 中提取数据。我收到了 Excon 的回复。为了搜索带有 Ruby 关键字的书籍,当我调用 res.body 时,我得到以下字符串:

<?xml version="1.0" ?>
<ItemSearchResponse xmlns="http://webservices.amazon.com/AWSECommerceService/2011-08-01">
  <OperationRequest>
    <HTTPHeaders>
      <Header Name="UserAgent" Value="Jeff/1.0.1 (Language=Ruby; new-host-2.home)"></Header>
    </HTTPHeaders>
    <RequestId>fa6e6962-15b0-4da6-abf2-12a688820dd3</RequestId>
    <Arguments>
      <Argument Name="Operation" Value="ItemSearch"></Argument>
      <Argument Name="Service" Value="AWSECommerceService"></Argument>
      <Argument Name="ItemPage" Value="1"></Argument>
      <Argument Name="AssociateTag" Value="thestu0f-20"></Argument>
      <Argument Name="Version" Value="2011-08-01"></Argument>
      <Argument Name="Keywords" Value="Ruby"></Argument>
      <Argument Name="SignatureMethod" Value="HmacSHA256"></Argument>
      <Argument Name="SearchIndex" Value="Books"></Argument>
      <Argument Name="SignatureVersion" Value="2"></Argument>
      <Argument Name="Signature" Value="05pqRqRK6DBFuOcXRhQvMO0XOj2b8a1bnMi5eB07fjs="></Argument>
      <Argument Name="AWSAccessKeyId" Value="AKIAI25J7QK5VYQ7HTJQ"></Argument>
      <Argument Name="Timestamp" Value="2013-12-27T06:37:09Z"></Argument>
    </Arguments>
    <RequestProcessingTime>0.2768830000000000</RequestProcessingTime>
  </OperationRequest>
  <Items>
    <Request>
      <IsValid>True</IsValid>
      <ItemSearchRequest>
        <ItemPage>1</ItemPage>
        <Keywords>Ruby</Keywords>
        <ResponseGroup>Small</ResponseGroup>
        <SearchIndex>Books</SearchIndex>
      </ItemSearchRequest>
    </Request>
    <TotalResults>19360</TotalResults>
    <TotalPages>1936</TotalPages>
    <MoreSearchResultsUrl>http://www.amazon.com/gp/redirect.html?camp=2025&amp;creative=386001&amp;location=http%3A%2F%2Fwww.amazon.com%2Fgp%2Fsearch%3Fkeywords%3DRuby%26url%3Dsearch-alias%253Dstripbooks&amp;linkCode=xm2&amp;tag=thestu0f-20&amp;SubscriptionId=AKIAI25J7QK5VYQ7HTJQ</MoreSearchResultsUrl>
    <Item>
      <ASIN>0596516177</ASIN>
      <DetailPageURL>http://www.amazon.com/Ruby-Programming-Language-David-Flanagan/dp/0596516177%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D0596516177</DetailPageURL>
      <ItemLinks>
        <ItemLink>
          <Description>Technical Details</Description>
          <URL>http://www.amazon.com/Ruby-Programming-Language-David-Flanagan/dp/tech-data/0596516177%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0596516177</URL>
        </ItemLink>
        <ItemLink>
          <Description>Add To Baby Registry</Description>
          <URL>http://www.amazon.com/gp/registry/baby/add-item.html%3Fasin.0%3D0596516177%26SubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0596516177</URL>
        </ItemLink>
        <ItemLink>
          <Description>Add To Wedding Registry</Description>
          <URL>http://www.amazon.com/gp/registry/wedding/add-item.html%3Fasin.0%3D0596516177%26SubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0596516177</URL>
        </ItemLink>
        <ItemLink>
          <Description>Add To Wishlist</Description>
          <URL>http://www.amazon.com/gp/registry/wishlist/add-item.html%3Fasin.0%3D0596516177%26SubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0596516177</URL>
        </ItemLink>
        <ItemLink>
          <Description>Tell A Friend</Description>
          <URL>http://www.amazon.com/gp/pdp/taf/0596516177%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0596516177</URL>
        </ItemLink>
        <ItemLink>
          <Description>All Customer Reviews</Description>
          <URL>http://www.amazon.com/review/product/0596516177%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0596516177</URL>
        </ItemLink>
        <ItemLink>
          <Description>All Offers</Description>
          <URL>http://www.amazon.com/gp/offer-listing/0596516177%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D0596516177</URL>
        </ItemLink>
      </ItemLinks>
      <ItemAttributes>
        <Author>David Flanagan</Author>
        <Author>Yukihiro Matsumoto</Author>
        <Manufacturer>O'Reilly Media</Manufacturer>
        <ProductGroup>Book</ProductGroup>
        <Title>The Ruby Programming Language</Title>
      </ItemAttributes>
    </Item>
    <Item>
      <ASIN>1937785491</ASIN>
      <DetailPageURL>http://www.amazon.com/Programming-Ruby-1-9-2-0-Programmers/dp/1937785491%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D1937785491</DetailPageURL>
      <ItemLinks>
        <ItemLink>
          <Description>Technical Details</Description>
          <URL>http://www.amazon.com/Programming-Ruby-1-9-2-0-Programmers/dp/tech-data/1937785491%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D1937785491</URL>
        </ItemLink>
        <ItemLink>
          <Description>Add To Baby Registry</Description>
          <URL>http://www.amazon.com/gp/registry/baby/add-item.html%3Fasin.0%3D1937785491%26SubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D1937785491</URL>
        </ItemLink>
        <ItemLink>
          <Description>Add To Wedding Registry</Description>
          <URL>http://www.amazon.com/gp/registry/wedding/add-item.html%3Fasin.0%3D1937785491%26SubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D1937785491</URL>
        </ItemLink>
        <ItemLink>
          <Description>Add To Wishlist</Description>
          <URL>http://www.amazon.com/gp/registry/wishlist/add-item.html%3Fasin.0%3D1937785491%26SubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D1937785491</URL>
        </ItemLink>
        <ItemLink>
          <Description>Tell A Friend</Description>
          <URL>http://www.amazon.com/gp/pdp/taf/1937785491%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D1937785491</URL>
        </ItemLink>
        <ItemLink>
          <Description>All Customer Reviews</Description>
          <URL>http://www.amazon.com/review/product/1937785491%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D1937785491</URL>
        </ItemLink>
        <ItemLink>
          <Description>All Offers</Description>
          <URL>http://www.amazon.com/gp/offer-listing/1937785491%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D1937785491</URL>
        </ItemLink>
      </ItemLinks>
      <ItemAttributes>
        <Author>Dave Thomas</Author>
        <Author>Andy Hunt</Author>
        <Author>Chad Fowler</Author>
        <Manufacturer>Pragmatic Bookshelf</Manufacturer>
        <ProductGroup>Book</ProductGroup>
        <Title>Programming Ruby 1.9 &amp; 2.0: The Pragmatic Programmers' Guide (The Facets of Ruby)</Title>
      </ItemAttributes>
    </Item>
    ...
    <Item>
      <ASIN>1937785564</ASIN>
      <DetailPageURL>http://www.amazon.com/Agile-Development-Rails-Facets-Ruby/dp/1937785564%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D1937785564</DetailPageURL>
      <ItemLinks>
        <ItemLink>
          <Description>Technical Details</Description>
          <URL>http://www.amazon.com/Agile-Development-Rails-Facets-Ruby/dp/tech-data/1937785564%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D1937785564</URL>
        </ItemLink>
        <ItemLink>
          <Description>Add To Baby Registry</Description>
          <URL>http://www.amazon.com/gp/registry/baby/add-item.html%3Fasin.0%3D1937785564%26SubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D1937785564</URL>
        </ItemLink>
        <ItemLink>
          <Description>Add To Wedding Registry</Description>
          <URL>http://www.amazon.com/gp/registry/wedding/add-item.html%3Fasin.0%3D1937785564%26SubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D1937785564</URL>
        </ItemLink>
        <ItemLink>
          <Description>Add To Wishlist</Description>
          <URL>http://www.amazon.com/gp/registry/wishlist/add-item.html%3Fasin.0%3D1937785564%26SubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D1937785564</URL>
        </ItemLink>
        <ItemLink>
          <Description>Tell A Friend</Description>
          <URL>http://www.amazon.com/gp/pdp/taf/1937785564%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D1937785564</URL>
        </ItemLink>
        <ItemLink>
          <Description>All Customer Reviews</Description>
          <URL>http://www.amazon.com/review/product/1937785564%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D1937785564</URL>
        </ItemLink>
        <ItemLink>
          <Description>All Offers</Description>
          <URL>http://www.amazon.com/gp/offer-listing/1937785564%3FSubscriptionId%3DAKIAI25J7QK5VYQ7HTJQ%26tag%3Dthestu0f-20%26linkCode%3Dxm2%26camp%3D2025%26creative%3D386001%26creativeASIN%3D1937785564</URL>
        </ItemLink>
      </ItemLinks>
      <ItemAttributes>
        <Author>Sam Ruby</Author>
        <Author>Dave Thomas</Author>
        <Author>David Heinemeier Hansson</Author>
        <Manufacturer>Pragmatic Bookshelf</Manufacturer>
        <ProductGroup>Book</ProductGroup>
        <Title>Agile Web Development with Rails 4 (Facets of Ruby)</Title>
      </ItemAttributes>
    </Item>
  </Items>
</ItemSearchResponse>

接下来我尝试创建一个 XML 文档:

xml_doc = Nokogiri::XML(res.body)

并获得以下信息:

#<Nokogiri::XML::Document:0x3fcc4b3e8f94 name="document" children=[#<Nokogiri::XML::Element:0x3fcc4b3e8ae4 name="ItemSearchResponse" namespace=#<Nokogiri::XML::Namespace:0x3fcc4b3e8a58 href="http://webservices.amazon.com/AWSECommerceService/2011-08-01"> children=[#<Nokogiri::XML::Element:0x3fcc4b043074 name="OperationRequest" namespace=#<Nokogiri::XML::Namespace:0x3fcc4b3e8a58 href="http://webservices.amazon.com/AWSECommerceService/2011-08-01"> children=[#<Nokogiri::XML::Element:0x3fcc4b042c50 name="HTTPHeaders" namespace=#<Nokogiri::XML::Namespace:0x3fcc4b3e8a58 href="http://webservices.amazon.com/AWSECommerceService/2011-08-01"> children=[#<Nokogiri::XML::Element:0x3fcc4b04282c name="Header" namespace=#<Nokogiri::XML::Namespace:0x3fcc4b3e8a58 href="http://webservices.amazon.com/AWSECommerceService/2011-08-01"> attributes=[#<Nokogiri::XML::Attr:0x3fcc4b0427c8 name="Name" value="UserAgent">, #<Nokogiri::XML::Attr:0x3fcc4b0427b4 name="Value" value="Jeff/1.0.1 (Language=Ruby; new-host-2.home)">]>]>, #<Nokogiri::XML::Element:0x3fcc4b041bfc name="RequestId" namespace=#<Nokogiri::XML::Namespace:0x3fcc4b3e8a58 href="http://webservices.amazon.com/AWSECommerceService/2011-08-01"> children=[#<Nokogiri::XML::Text:0x3fcc4ade5c64 "fa6e6962-15b0-4da6-abf2-12a688820dd3">]>, #<Nokogiri::XML::Element:0x3fcc4ade5944 name="Arguments" namespace=#<Nokogiri::XML::Namespace:0x3fcc4b3e8a58 href="http://webservices.amazon.com/AWSECommerceService/2011-08-01"> children=[#<Nokogiri::XML::Element:0x3fcc4ade5264 name="Argument" namespace=#<Nokogiri::XML::Namespace:0x3fcc4b3e8a58 href="http://webservices.amazon.com/AWSECommerceService/2011-08-01"> attributes=[#<Nokogiri::XML::Attr:0x3fcc4ade5200 name="Name" value="Operation">, #<Nokogiri::XML::Attr:0x3fcc4ade51ec name="Value" value="ItemSearch">]>

我不得不缩短文档以适应这个问题。我正在尝试在此文档上执行不同的 XPath 解析并不断获取空数组作为返回值。我已经阅读了关于 Zeno 和 W3 的教程,但我对我应该做什么感到非常困惑。我想要的只是书名和作者。

任何关于从哪里开始的帮助或如何正确解析此数据的示例将不胜感激。此外,使用 XPath 解析 Nokogiri XML Doc 或 CSS 是最佳实践吗?有一个选项可以将响应转换为哈希,如果我选择它,解析会更容易吗?是否有可用的哈希解析器?谢谢!

注意

我目前正在使用这种从请求中提取结果的方法:

req = Vaccuum.new

req.configure(
  aws_access_key_id: ENV["S3_ACCESS_KEY"], 
  aws_secret_access_key: ENV["S3_SECRET_KEY"],
  associate_tag: ENV["AMAZON_ASSOCIATE_TAG"]
)

params {
  'SearchIndex' => 'Books',
  'Keywords'    => 'Keywords',
  'ItemPage'    => 1
}

item_search_res = req.item_search(params)

xml_doc = Nokogiri::XML(item_search_res.body)

asins   = xml_doc.search('ASIN').map   { |n| n.children.text }
authors = xml_doc.search('Author').map { |n| n.children.text }
titles  = xml_doc.search('Title').map  { |n| n.children.text }

【问题讨论】:

  • 没有使用 CSS 或 XPath 的“最佳实践”,无论是 HTML 还是 XML。 CSS 有很多优点,就像可读性和简单性一样。 XPath 更具表现力,但这是可读性的代价。随心所欲,混合使用,没关系。

标签: ruby-on-rails xml xpath xml-parsing nokogiri


【解决方案1】:

您尝试过哪些 XPath?

由于源文档使用命名空间,您需要声明和使用命名空间前缀:

doc.xpath("/az:ItemSearchResponse/az:Items/az:Item", "az" => "http://webservices.amazon.com/AWSECommerceService/2011-08-01")

或者您可以在查询文档之前删除命名空间:

doc.remove_namespaces!
doc.xpath("/ItemSearchResponse/Items/Item")

【讨论】:

  • 感谢 JLRishe,我将尝试删除命名空间。我刚刚创建了一个编辑,显示了我提取信息的方式。你认为我应该使用 XPath 还是我目前使用的搜索方法?
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2015-09-08
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多