【发布时间】:2013-12-24 04:50:31
【问题描述】:
在这段 HTML 代码中:
<div class="item">
<div class="thumb">
<a href="http://www.mp3crank.com/wolf-eyes/lower-demos-121866" rel="bookmark" lang="en" title="Wolf Eyes - Lower Demos album downloads">
<img width="100" height="100" alt="Mp3 downloads Wolf Eyes - Lower Demos" title="Free mp3 downloads Wolf Eyes - Lower Demos" src="http://www.mp3crank.com/cover-album/Wolf-Eyes-–-Lower-Demos.jpg" /></a>
</div>
<div class="release">
<h3>Wolf Eyes</h3>
<h4>
<a href="http://www.mp3crank.com/wolf-eyes/lower-demos-121866" title="Wolf Eyes - Lower Demos">Lower Demos</a>
</h4>
<script src="/ads/button.js"></script>
</div>
<div class="release-year">
<p>Year</p>
<span>2013</span>
</div>
<div class="genre">
<p>Genre</p>
<a href="http://www.mp3crank.com/genre/rock" rel="tag">Rock</a>
<a href="http://www.mp3crank.com/genre/pop" rel="tag">Pop</a>
</div>
</div>
我知道如何以其他方式解析它,但我想使用 HTMLAgilityPack 库检索此信息:
Title : Wolf Eyes - Lower Demos Cover : http://www.mp3crank.com/cover-album/Wolf-Eyes-–-Lower-Demos.jpg Year : 2013 Genres: Rock, Pop URL : http://www.mp3crank.com/wolf-eyes/lower-demos-121866
这些 html 行是什么:
Title : title="Wolf Eyes - Lower Demos"
Cover : src="http://www.mp3crank.com/cover-album/Wolf-Eyes-–-Lower-Demos.jpg"
Year : <span>2013</span>
Genre1: <a href="http://www.mp3crank.com/genre/rock" rel="tag">Rock</a>
Genre2: <a href="http://www.mp3crank.com/genre/pop" rel="tag">Pop</a>
URL : href="http://www.mp3crank.com/wolf-eyes/lower-demos-121866"
这就是我正在尝试的,但是在尝试选择单个节点时,我总是得到一个 object reference not set 异常,
抱歉,我是 HTML 新手,我尝试按照这个问题的步骤进行操作 HtmlAgilityPack basic how to get title and link?
Public Class Form1
Private htmldoc As HtmlAgilityPack.HtmlDocument = New HtmlAgilityPack.HtmlDocument
Private htmlnodes As HtmlAgilityPack.HtmlNodeCollection = Nothing
Private Title As String = String.Empty
Private Cover As String = String.Empty
Private Genres As String() = {String.Empty}
Private Year As Integer = -0
Private URL as String = String.Empty
Private Sub Test() Handles MyBase.Shown
' Load the html document.
htmldoc.LoadHtml(IO.File.ReadAllText("C:\source.html"))
' Select the (10 items) nodes.
htmlnodes = htmldoc.DocumentNode.SelectNodes("//div[@class='item']")
' Loop trough the nodes.
For Each node As HtmlAgilityPack.HtmlNode In htmlnodes
Title = node.SelectSingleNode("//div[@class='release']").Attributes("title").Value
Cover = node.SelectSingleNode("//div[@class='thumb']").Attributes("src").Value
Year = CInt(node.SelectSingleNode("//div[@class='release-year']").Attributes("span").Value)
Genres = ¿select multiple nodes?
URL = node.SelectSingleNode("//div[@class='release']").Attributes("href").Value
Next
End Sub
End Class
【问题讨论】:
标签: html .net vb.net html-parsing html-agility-pack