【发布时间】:2018-04-08 09:03:17
【问题描述】:
我正在网页上进行爬网练习,并想在我的网页上显示一些内容,但我陷入了寻找 Descendants 的代码中。 下面是网页HTML
<ul class="results">
<li class="gts" data-webm-section="OAG-AD-14960184">
<a class="item-link-container" href="/bikes/details/2016-Indian-Chieftain-Dark-Horse-MY17/OAG-AD-14960184/?cr=0&gts=OAG-AD-14960184&gtsviewtype=TopSpot&gtssaleid=OAG-AD-14960184&psq=%28%28Service%3D%5BBikesales%5D%26State%3D%5BNSW%5D%29%26%28%28%28%28SiloType%3D%5BBrand%20new%20bikes%20available%5D%7CSiloType%3D%5BBrand%20new%20bikes%20in%20stock%5D%29%7CSiloType%3D%5BDealer%20used%20bikes%5D%29%7CSiloType%3D%5BDemo%20%26%20near%20new%20bikes%5D%29%7CSiloType%3D%5BPrivate%20used%20bikes%5D%29%29&pso=0&pss=Premium">
<header>
<h3><span class=></span>Heading</h3>
<div class="spotlight flag non-textual">Spotlight</div>
</header>
<div class="primary panel">
<ul class="photos" data-js-lazy-load-length="3" style="width:8350px">
<li>
<img src="http//" height="221" width="334" alt="2016 Indian Chieftain Dark Horse MY17" />
</li>
</ul>
<div class="image-nav previous" data-webm-clickvalue="previous-image">
<span class="arrow"></span>
<span class="background"></span>
</div>
<div class="image-nav next" data-webm-clickvalue="next-image">
<span class="arrow"></span>
<span class="background"></span>
</div>
<div class="image-nav-count">
<span class="current">1</span> of 24
</div>
</div>
<div class="secondary panel">
<span class="price">$29,995*</span>
<div data-fancybox-href="/mvcajax/bikes/PriceGuide/" class="pricing-message light-box-iframe">
Ride Away No More To Pay
</div>
<div class="features">
<ul>
<li class="ui-category">
<i></i>Cruiser
</li>
<li class="engine-size">
<i></i>1,811 cc
</li>
<li class="odometer">
<i></i>2,552 km
</li>
</ul>
<div class="bike-facts non-textual"></div>
</div>
</div>
<p class="description">**NO REGRETS - 7 Day Money Back guarantee** PLUS 12 months Warranty & Roadside Assist.Conditions App...</p>
</a>
</li>
</ul>
在上面的 html 中,我想要自行车的图片、价格和类别
下面是我的代码
public async Task<ActionResult> Webcrawl()
{
string URL = "https://www.bikesales.com.au/bikes/new-south-wales/";
List<bikes> bikelist = new List<bikes>();
using (var client = new HttpClient())
{
var html = await client.GetStringAsync(URL);
HtmlDocument Doc = new HtmlDocument();
Doc.LoadHtml(html);
var ProductsHtml = Doc.DocumentNode.Descendants("ul").Where(node => node.GetAttributeValue("class", "").Equals("results")).ToList();
var ProductsList = ProductsHtml[0].Descendants("li").Where(node => node.GetAttributeValue("class", "").Equals("gts")).ToList();
foreach (var list in ProductsList)
{
var PriceNode = list.SelectSingleNode("//div[@class='secondary panel']");
var bike = new bikes
{
Name = list.Descendants("h3").FirstOrDefault().InnerText,
Title = list.Descendants("p").FirstOrDefault().InnerText,
Price = PriceNode.Descendants("span").FirstOrDefault().InnerText,
Image = list.SelectNodes("//div[@class='primary panel']/ul[1]/li[1]/img").FirstOrDefault().ChildAttributes("src").FirstOrDefault().Value,
Type = list.SelectNodes("//div[@class='secondary panel']/div[2]/ul[1]/li[1]").FirstOrDefault().InnerText.Trim('\r', '\n', '\t'),
Engine = list.SelectNodes("//div[@class='secondary panel']/div[2]/ul[1]/li[2]").FirstOrDefault().InnerText.Trim('\r', '\n', '\t'),
Odometer = list.SelectNodes("//div[@class='secondary panel']/div[2]/ul[1]/li[3]").FirstOrDefault().InnerText.Trim('\r', '\n', '\t'),
};
bikelist.Add(bike);
}
return View(bikelist);
}
}
当我运行上面的代码时,我只得到列表的第一个元素,除了名称,即相同的图像、相同的类型和相同的价格。
请纠正我在代码中的错误。 提前致谢。
【问题讨论】:
-
基本上对于每个节点,该代码都试图从根中获取元素,这就是为什么你会得到重复。
-
那有什么解决办法?
-
一旦你有一个 html 节点,你是否尝试过使用点来指定你想从当前元素搜索?
-
是的,也试过了
-
我注意到您的一些 xpath 返回 null,请参阅下面的解决方案。
标签: c# xpath html-agility-pack