【问题标题】:Grabbing the text value of an XML node + Nokogiri and xpath获取 XML 节点的文本值 + Nokogiri 和 xpath
【发布时间】:2014-01-14 08:45:44
【问题描述】:

我已经构建了一个 rake 文件,用于将我获取的关于某个特定的所有信息插入到我的数据库中。这一切正常,但是我的键的值没有填充任何数据。我可能不正确地调用 at_xpath 吗?我将在下面发布一个示例--

information = {
            "street_address" => property.at_xpath("/Address/AddressLine1/text()"),
            "city" => property.at_xpath("/Address/City/text()"),
            "zipcode" => property.at_xpath("/Address/PostalCode/text()"),
            "short_description" => property.at_xpath("/Information/ShortDescription/text()"),
            "long_description" => property.at_xpath("Information/LongDescription/text()"),
            "rent" => property.at_xpath("/Information/Rents/StandardRent/text()"),
            "application_fee" => property.at_xpath("/Fee/ApplicationFee/text()"),
            "bedrooms" => property.at_xpath("/Floorplan/Room[@RoomType='Bedroom']/Count/text()"),
            "bathrooms" => property.at_xpath("/Floorplan/Room[@RoomType='Bathroom']/Count/text()"),
            "bathrooms" => property.at_xpath("/ILS_Unit/Availability/VacancyClass/text()")
        }

我知道除了将数据放入上面列出的散列中的实际值空间之外,一切都运行良好。我也知道 nokogiri 和 xpath 工作正常,因为我已将 s 的数量从 33,000+ 缩小到 1,068。

非常感谢任何指导!谢谢你:)

==========================更新===================== =======

我认为查看整个循环可能有助于增加清晰度 --

doc.xpath("//Property/PropertyID/Identification[@OrganizationName='northsteppe']").each do |property|

        # GATHER EACH PROPERTY'S INFORMATION
        information = {
            "street_address" => property.at_xpath("/Address/AddressLine1/text()"),
            "city" => property.at_xpath("/Address/City/text()"),
            "zipcode" => property.at_xpath("/Address/PostalCode/text()"),
            "short_description" => property.at_xpath("/Information/ShortDescription/text()"),
            "long_description" => property.at_xpath("Information/LongDescription/text()"),
            "rent" => property.at_xpath("/Information/Rents/StandardRent/text()"),
            "application_fee" => property.at_xpath("/Fee/ApplicationFee/text()"),
            "bedrooms" => property.at_xpath("/Floorplan/Room[@RoomType='Bedroom']/Count/text()"),
            "bathrooms" => property.at_xpath("/Floorplan/Room[@RoomType='Bathroom']/Count/text()"),
            "bathrooms" => property.at_xpath("/ILS_Unit/Availability/VacancyClass/text()")
        }


        # CREATE NEW PROPERTY WITH INFORMATION HASH CREATED ABOVE
        if Property.create!(information)
            puts "yay!"
        else
            puts "oh no! this sucks!"
        end

    end # ENDS XPATH EACH LOOP

============================= 另一个更新================= =========

所以我尝试将每个 at_xpath 路径末尾的“/text()”替换为“/inner_text()”并收到以下错误--

rake 中止! 无效的表达式:/Address/AddressLine1/inner_text()

然后我尝试将“at_xpath”调用切换为“at_css”调用并执行类似 --

"street_address" => property.at_css(".AddressLine1").text

但收到以下错误--

rake 中止! nil:NilClass 的未定义方法“text”

============================== 更新以显示 XML ============== ==============

<Property IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
  <PropertyID>
    <Identification IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8" OrganizationName="northsteppe" IDType="property"/>
    <Identification IDValue="6e1e61523972d5f0e260e3d38eb488337424f21e" OrganizationName="northsteppe" IDType="Company"/>
    <MarketingName>Spacious House Central Campus OSU, available fall</MarketingName>
    <WebSite>http://northsteppe.appfolio.com/listings/listings/642da00e-9be3-4a7c-bd50-66a4f0d70af8</WebSite>
    <Address AddressType="property">
      <Description>Address of Available Listing</Description>
      <AddressLine1>1689 N 4th St </AddressLine1>
      <City>Columbus</City>
      <State>OH</State>
      <PostalCode>43201</PostalCode>
      <Country>US</Country>
    </Address>
    <Phone PhoneType="office">
      <PhoneNumber>(614) 299-4110</PhoneNumber>
    </Phone>
    <Email>northsteppe.nsr@gmail.com</Email>
  </PropertyID>
  <ILS_Identification ILS_IdentificationType="Apartment" RentalType="Market Rate">
    <Latitude>39.997694</Latitude>
    <Longitude>-82.99903</Longitude>
    <LastUpdate Month="11" Day="11" Year="2013"/>
  </ILS_Identification>
  <Information>
    <StructureType>Standard</StructureType>
    <UnitCount>1</UnitCount>
    <ShortDescription>Spacious House Central Campus OSU, available fall</ShortDescription>
    <LongDescription>One of our favorites! This great house is perfect for students or a single family. With huge living and sleeping rooms, there is plenty of space. The kitchen is totally modernized with new appliances, and the bathroom has been updated. Natural woodwork and brick accents are seen within the house, and the decorative mantles. Ceiling fans and mini-blinds are included, as well as a FREE stack washer and dryer. The front and side deck. On site parking available.</LongDescription>
    <Rents>
      <StandardRent>2000.00</StandardRent>
    </Rents>
    <PropertyAvailabilityURL>http://northsteppe.appfolio.com/listings/listings/642da00e-9be3-4a7c-bd50-66a4f0d70af8</PropertyAvailabilityURL>
  </Information>
  <Fee>
    <ProrateType>Standard</ProrateType>
    <LateType>Standard</LateType>
    <LatePercent>0</LatePercent>
    <LateMinFee>0</LateMinFee>
    <LateFeePerDay>0</LateFeePerDay>
    <NonRefundableHoldFee>0</NonRefundableHoldFee>
    <AdminFee>0</AdminFee>
    <ApplicationFee>30.00</ApplicationFee>
    <BrokerFee>0</BrokerFee>
  </Fee>
  <Deposit DepositType="Security Deposit">
    <Amount AmountType="Actual">
      <ValueRange Exact="2000.00" Currency="USD"/>
    </Amount>
  </Deposit>
  <Policy>
    <Pet Allowed="false"/>
  </Policy>
  <Phase IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
    <Name/>
    <Description/>
    <UnitCount>1</UnitCount>
    <RentableUnits>1</RentableUnits>
    <TotalSquareFeet>0</TotalSquareFeet>
    <RentableSquareFeet>0</RentableSquareFeet>
  </Phase>
  <Building IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
    <Name/>
    <Description/>
    <UnitCount>1</UnitCount>
    <SquareFeet>0</SquareFeet>
  </Building>
  <Floorplan IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
    <Name/>
    <UnitCount>1</UnitCount>
    <Room RoomType="Bedroom">
      <Count>4</Count>
      <Comment/>
    </Room>
    <Room RoomType="Bathroom">
      <Count>1</Count>
      <Comment/>
    </Room>
    <SquareFeet Min="0" Max="0"/>
    <MarketRent Min="2000" Max="2000"/>
    <EffectiveRent Min="2000" Max="2000"/>
  </Floorplan>
  <ILS_Unit IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8">
    <Units>
      <Unit>
        <Identification IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8" OrganizationName="UL Portfolio"/>
        <MarketingName>Spacious House Central Campus OSU, available fall</MarketingName>
        <UnitBedrooms>4</UnitBedrooms>
        <UnitBathrooms>1.0</UnitBathrooms>
        <MinSquareFeet>0</MinSquareFeet>
        <MaxSquareFeet>0</MaxSquareFeet>
        <SquareFootType>internal</SquareFootType>
        <UnitRent>2000.00</UnitRent>
        <MarketRent>2000.00</MarketRent>
        <Address AddressType="property">
          <AddressLine1>1689 N 4th St </AddressLine1>
          <City>Columbus</City>
          <PostalCode>43201</PostalCode>
          <Country>US</Country>
        </Address>
      </Unit>
    </Units>
    <Availability>
      <VacateDate Month="7" Day="23" Year="2014"/>
      <VacancyClass>Unoccupied</VacancyClass>
      <MadeReadyDate Month="7" Day="23" Year="2014"/>
    </Availability>
    <Amenity AmenityType="Other">
      <Description>All new stainless steel appliances!  Refinished hardwood floors</Description>
    </Amenity>
    <Amenity AmenityType="Other">
      <Description>Ceramic tile</Description>
    </Amenity>
    <Amenity AmenityType="Other">
      <Description>Ceiling fans</Description>
    </Amenity>
    <Amenity AmenityType="Other">
      <Description>Wrap-around porch</Description>
    </Amenity>
    <Amenity AmenityType="Dryer">
      <Description>Free Washer and Dryer</Description>
    </Amenity>
    <Amenity AmenityType="Washer">
      <Description>Free Washer and Dryer</Description>
    </Amenity>
    <Amenity AmenityType="Other">
      <Description>off-street parking available</Description>
    </Amenity>
  </ILS_Unit>
  <File Active="true" FileID="820982141">
    <FileType>Photo</FileType>
    <Description>Unit Photo</Description>
    <Name/>
    <Caption/>
    <Format>image/jpeg</Format>
    <Src>http://pa.cdn.appfolio.com/northsteppe/images/31077069-6e81-4373-8a89-508c57585543/medium.jpg</Src>
    <Width>360</Width>
    <Height>300</Height>
    <Rank>1</Rank>
  </File>
  <File Active="true" FileID="820982145">
    <FileType>Photo</FileType>
    <Description>Unit Photo</Description>
    <Name/>
    <Caption/>
    <Format>image/jpeg</Format>
    <Src>http://pa.cdn.appfolio.com/northsteppe/images/84e1be40-96fd-4717-b75d-09b39231a762/medium.jpg</Src>
    <Width>350</Width>
    <Height>265</Height>
    <Rank>2</Rank>
  </File>
  <File Active="true" FileID="820982149">
    <FileType>Photo</FileType>
    <Description>Unit Photo</Description>
    <Name/>
    <Caption/>
    <Format>image/jpeg</Format>
    <Src>http://pa.cdn.appfolio.com/northsteppe/images/cd419635-c37f-4676-a43e-c72671a2a748/medium.jpg</Src>
    <Width>350</Width>
    <Height>265</Height>
    <Rank>3</Rank>
  </File>
  <File Active="true" FileID="820982152">
    <FileType>Photo</FileType>
    <Description>Unit Photo</Description>
    <Name/>
    <Caption/>
    <Format>image/jpeg</Format>
    <Src>http://pa.cdn.appfolio.com/northsteppe/images/6b68dbd5-2cde-477c-99d7-3ca33f03cce8/medium.jpg</Src>
    <Width>350</Width>
    <Height>265</Height>
    <Rank>4</Rank>
  </File>
  <File Active="true" FileID="820982155">
    <FileType>Photo</FileType>
    <Description>Unit Photo</Description>
    <Name/>
    <Caption/>
    <Format>image/jpeg</Format>
    <Src>http://pa.cdn.appfolio.com/northsteppe/images/17b6c7c0-686c-4e46-865b-11d80744354a/medium.jpg</Src>
    <Width>350</Width>
    <Height>265</Height>
    <Rank>5</Rank>
  </File>
  <File Active="true" FileID="820982157">
    <FileType>Photo</FileType>
    <Description>Unit Photo</Description>
    <Name/>
    <Caption/>
    <Format>image/jpeg</Format>
    <Src>http://pa.cdn.appfolio.com/northsteppe/images/3545ac8b-471f-404a-94b2-fcd00dd16e25/medium.jpg</Src>
    <Width>350</Width>
    <Height>265</Height>
    <Rank>6</Rank>
  </File>
  <File Active="true" FileID="820982160">
    <FileType>Photo</FileType>
    <Description>Unit Photo</Description>
    <Name/>
    <Caption/>
    <Format>image/jpeg</Format>
    <Src>http://pa.cdn.appfolio.com/northsteppe/images/02471172-2183-4bf1-a3d7-33415f902c1c/medium.jpg</Src>
    <Width>350</Width>
    <Height>265</Height>
    <Rank>7</Rank>
  </File>
</Property>

【问题讨论】:

  • 展示你的自动化测试。把at_xpath("/ILS_Unit的/去掉,因为它的意思是“回到文档的根目录”。 at_xpath() 是否需要 .to_s 才能将 XML 文本节点转回字符串?
  • @Philip - 我已删除 /s,然后将 .to_s 添加到每个 at_path 查询的末尾并重新运行我的 rake 任务,但仍然没有提交任何数据。还有什么建议吗?
  • 如果你添加p information 你有你的变量散列吗?如果创建!失败了,它的异常是什么?
  • @philip - 当我添加“放置信息”时,它会输出带有键而不是值的散列。
  • 我们不知道您的 XPath 是否正确,因为您没有向我们展示您正在解析的 XML。不过,你的做法是错误的,而且,即使它有效,它也是低效的。

标签: ruby-on-rails ruby xml xpath nokogiri


【解决方案1】:

您的第一个 XPath 太深了。它返回一个您需要 PropertyID 的标识。试试这个:

doc.xpath("//Property/PropertyID[ Identification/@OrganizationName = 'northsteppe' ]").each do |property|
    # GATHER EACH PROPERTY'S INFORMATION
    information = {
        "street_address" => property.at_xpath("Address/AddressLine1/text()").to_s,
        "city" => property.at_xpath("Address/City/text()").to_s,
        "zipcode" => property.at_xpath("Address/PostalCode/text()").to_s
        }
    p information
end

【讨论】:

  • @philip - 我爱你!
  • @philip,还有一个问题,所以我能够准确填写 street_address、city、zipcode 和 short_description 的值(通过获取 值。我现在需要获取其他部分挂在 节点之外的信息,我是否可以让我的第一次调用更深入,以便我可以使用我的 xpath 查询来到达这些节点?
  • 现在您已经解决了,在 XPath 中查找 parentancestor 轴。它是一种完整的关系查询语言,可与 SQL 或 Prolog 竞争
【解决方案2】:

在你的循环中你这样做:

doc.xpath("//Property/PropertyID/Identification[@OrganizationName='northsteppe']").each do |property|

然后,根据您的价值观,您可以执行以下操作:

property.at_xpath("/Address/AddressLine1/text()")

您不能将相对于property/Address/AddressLine1/text() 与XPath 一起使用。

Nokogiri 将搜索/Address/AddressLine1/text(),这意味着,从绝对路径开始,即从文档顶部/ 开始,在其下方找到Address 节点,找到AddressLine1 节点在它下面....

改为使用:

Address/AddressLine1/text()

这意味着搜索 relativeproperty 并得到完整的 XPath:

//Property/PropertyID/Identification[@OrganizationName='northsteppe']/Address/AddressLine1/text()

查看您添加的 XML...

您想要的路径不存在。在 PRY 中查看它:

[16] (pry) main: 0> puts doc.xpath("//Property/PropertyID/Identification[@OrganizationName='northsteppe']").to_xml
<Identification IDValue="642da00e-9be3-4a7c-bd50-66a4f0d70af8" OrganizationName="northsteppe" IDType="property"/><Identification IDValue="6e1e61523972d5f0e260e3d38eb488337424f21e" OrganizationName="northsteppe" IDType="Company"/>

property 节点都没有子节点。只有property 的节点存在,因此您要查找的所有值(即子节点)都不存在。

相反,您似乎想找到Property 节点并向下工作:

【讨论】:

  • 我确实在问题末尾添加了我正在解析的 XML 示例。当我去掉每个 at_xpath 查询前面的 /s 时,它只是将“nil”作为我哈希中每个键的值。感谢您的积极方向:)
  • 因此,如果我试图获取所有具有 节点且 OrganizationName = 'northsteppe' 的属性节点,然后遍历这些属性节点以进行 at_path 调用,会发生什么这样做的正确方法是?
猜你喜欢
  • 2011-07-19
  • 1970-01-01
  • 2011-06-14
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2011-10-28
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多