【发布时间】:2021-10-26 02:41:18
【问题描述】:
我正在做一个从https://platinumgod.co.uk/ 抓取项目的项目,并且我很难访问两个元素之间的所有<p> 标签。
这是 HTML:
<li class="textbox" data-tid="42.5" data-cid="42" data-sid="263" style="display: inline-block;">
<a>
<div onclick="" class="item reb-itm-new re-itm263"></div>
<span>
<p class="item-title">Clear Rune</p>
<p class="r-itemid">ItemID: 263</p>
<p class="pickup">"Rune mimic"</p>
<p class="quality">Quality: 2</p>
<p>When used, copies the effect of the Rune or Soul stone you are holding (like the Blank Card)</p>
<p>Drops a random rune on the floor when picked up</p>
<p>The recharge time of this item depends on the Rune/Soul Stone held:</p>
<p>1 room: Soul of Lazarus</p>
<p>2 rooms: Rune of Ansuz, Rune of Berkano, Rune of Hagalaz, Soul of Cain</p>
<p>3 rooms: Rune of Algiz, Blank Rune, Soul of Magdalene, Soul of Judas, Soul of ???, Soul of the Lost</p>
<p>4 rooms: Rune of Ehwaz, Rune of Perthro, Black Rune, Soul of Isaac, Soul of Eve, Soul of Eden, Soul of the Forgotten, Soul of Jacob and Esau</p>
<p>6 rooms: Rune of Dagaz, Soul of Samson, Soul of Azazel, Soul of Apollyon, Soul of Bethany</p>
<p>12 rooms: Rune of Jera, Soul of Lilith, Soul of the Keeper</p>
<ul>
<p>Type: Active</p>
<p>Recharge time: Varies</p>
<p>Item Pool: Secret Room, Crane Game</p>
</ul>
<p class="tags">* Secret Room</p>
</span>
</a>
</li>
我要做的是返回<p class="quality">(不包括这个标签)和第一个<ul>之间的所有<p>标签。
我已经尝试了在论坛上找到的几种解决方案,并且使用我在其中一个答案中找到的以下代码只取得了部分成功(不会撒谎,我很难理解这里发生了什么)。我正在迭代的原因是因为 HTML 中有几个项目需要抓取:
items = html.at(".repentanceitems-container").css("li.textbox").each do |item|
use = item.xpath(".//a/span/p[5]/following-sibling::p[count(.//a/span/p[6]/preceding-sibling::p)=
count(.//a/span/p[6]/preceding-sibling::p)]")
end
但是,这只会返回<p class="quality"> 之后的第一个<p> 标记。我敢肯定,由于我不理解代码,因此可能很简单。我还访问了我想要包含的第一个 <p> 元素和它需要结束的 <ul>,但我不确定如何使用此信息:
# First line of item use
start = item.xpath('.//a/span/p[5]')
# ul tag
ending = item.xpath('.//a/span/ul[1]')
对此的任何帮助将不胜感激!
【问题讨论】:
标签: ruby web-scraping nokogiri