【发布时间】:2013-10-01 18:25:03
【问题描述】:
我正在尝试使用 Nokogiri 解析 HTML 字符串,但遇到了一些递归问题,我不知道为什么。
给定这些命令:
string = <h3>Lancers were arranged. </h3>
<div>Gabriel found himself partnered with Miss Ivors.</div>
<br>She leaned. He lit a <b>candle</b>.
They followed him in silence, their feet falling in soft thuds on the thickly carpeted stairs.<br>
body = Nokogiri::HTML(string)
result = []
body.traverse { |node| result << node }
我希望有一个上述元素的数组。相反,我得到了这个:
[#<Nokogiri::XML::DTD:0x3fde1f3d5274 name="html">
#<Nokogiri::XML::Text:0x3fde1e88d330 "Lancers were arranged. ">
#<Nokogiri::XML::Element:0x3fde1ea56a68 name="h3" children=[#<Nokogiri::XML::Text:0x3fde1e88d330 "Lancers were arranged. ">]>
#<Nokogiri::XML::Text:0x3fde1e88c764 "Gabriel found himself partnered with Miss Ivors.">
#<Nokogiri::XML::Element:0x3fde1e88cd04 name="div" children=[#<Nokogiri::XML::Text:0x3fde1e88c764 "Gabriel found himself partnered with Miss Ivors.">]>
#<Nokogiri::XML::Element:0x3fde1e88c0fc name="br">
#<Nokogiri::XML::Text:0x3fde1e88b9e0 "She leaned. He lit a ">
#<Nokogiri::XML::Text:0x3fde1eba6c60 "candle">
#<Nokogiri::XML::Element:0x3fde1e88b5f8 name="b" children=[#<Nokogiri::XML::Text:0x3fde1eba6c60 "candle">]>
#<Nokogiri::XML::Text:0x3fde1eba6454 ". They followed him in silence
their feet falling in soft thuds on the thickly carpeted stairs.">
#<Nokogiri::XML::Element:0x3fde1eba5f54 name="br">
#<Nokogiri::XML::Element:0x3fde1ea56f7c name="body" children=[#<Nokogiri::XML::Element:0x3fde1ea56a68 name="h3" children=[#<Nokogiri::XML::Text:0x3fde1e88d330 "Lancers were arranged. ">]>
#<Nokogiri::XML::Element:0x3fde1e88cd04 name="div" children=[#<Nokogiri::XML::Text:0x3fde1e88c764 "Gabriel found himself partnered with Miss Ivors.">]>
#<Nokogiri::XML::Element:0x3fde1e88c0fc name="br">
#<Nokogiri::XML::Text:0x3fde1e88b9e0 "She leaned. He lit a ">
#<Nokogiri::XML::Element:0x3fde1e88b5f8 name="b" children=[#<Nokogiri::XML::Text:0x3fde1eba6c60 "candle">]>
#<Nokogiri::XML::Text:0x3fde1eba6454 ". They followed him in silence
their feet falling in soft thuds on the thickly carpeted stairs.">
#<Nokogiri::XML::Element:0x3fde1eba5f54 name="br">]>
#<Nokogiri::XML::Element:0x3fde1ea575e4 name="html" children=[#<Nokogiri::XML::Element:0x3fde1ea56f7c name="body" children=[#<Nokogiri::XML::Element:0x3fde1ea56a68 name="h3" children=[#<Nokogiri::XML::Text:0x3fde1e88d330 "Lancers were arranged. ">]>
#<Nokogiri::XML::Element:0x3fde1e88cd04 name="div" children=[#<Nokogiri::XML::Text:0x3fde1e88c764 "Gabriel found himself partnered with Miss Ivors.">]>
#<Nokogiri::XML::Element:0x3fde1e88c0fc name="br">
#<Nokogiri::XML::Text:0x3fde1e88b9e0 "She leaned. He lit a ">
#<Nokogiri::XML::Element:0x3fde1e88b5f8 name="b" children=[#<Nokogiri::XML::Text:0x3fde1eba6c60 "candle">]>
#<Nokogiri::XML::Text:0x3fde1eba6454 ". They followed him in silence
their feet falling in soft thuds on the thickly carpeted stairs.">
#<Nokogiri::XML::Element:0x3fde1eba5f54 name="br">]>]>
#<Nokogiri::HTML::Document:0x3fde1f3d6084 name="document" children=[#<Nokogiri::XML::DTD:0x3fde1f3d5274 name="html">
#<Nokogiri::XML::Element:0x3fde1ea575e4 name="html" children=[#<Nokogiri::XML::Element:0x3fde1ea56f7c name="body" children=[#<Nokogiri::XML::Element:0x3fde1ea56a68 name="h3" children=[#<Nokogiri::XML::Text:0x3fde1e88d330 "Lancers were arranged. ">]>
#<Nokogiri::XML::Element:0x3fde1e88cd04 name="div" children=[#<Nokogiri::XML::Text:0x3fde1e88c764 "Gabriel found himself partnered with Miss Ivors.">]>
#<Nokogiri::XML::Element:0x3fde1e88c0fc name="br">
#<Nokogiri::XML::Text:0x3fde1e88b9e0 "She leaned. He lit a ">
#<Nokogiri::XML::Element:0x3fde1e88b5f8 name="b" children=[#<Nokogiri::XML::Text:0x3fde1eba6c60 "candle">]>
#<Nokogiri::XML::Text:0x3fde1eba6454 ". They followed him in silence
their feet falling in soft thuds on the thickly carpeted stairs.">
#<Nokogiri::XML::Element:0x3fde1eba5f54 name="br">]>]>]>]
抱歉,篇幅较长。谁能帮我弄清楚为什么会这样?和/或如何预防?
【问题讨论】:
-
请分别粘贴输入输出HTML。不要使用单行字符串或字符串化对象。这很难理解,如果你这样做了,我想你会发现问题(
Element对象包括他们的孩子,即使是字符串化的)。
标签: ruby-on-rails ruby ruby-on-rails-3 nokogiri