【问题标题】:HtmlAgilityPack + Vb + Try get PicHtmlAgilityPack + Vb + 尝试获取图片
【发布时间】:2019-09-28 15:00:51
【问题描述】:

我有获取标题和图片的当前代码。 标题在一个文本框中,图片在一个图片框中。

在我的 windows 窗体中,我有:

Imports System
Imports System.Xml
Imports HtmlAgilityPack
Imports System.Net
Imports System.IO
Imports System.Collections.Generic

在测试的加载页面中我有:

Public Class scrapper
    Private Sub scrapper_Load(sender As Object, e As EventArgs) Handles MyBase.Load

        'Enable SSL Suppport'
        ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12
        'WebPage to Scrapping'
        Dim link As String = "https://www.nextinpact.com/"
        'download page from the link into an HtmlDocument'
        Dim doc As HtmlDocument = New HtmlWeb().Load(link)
        'select the title'
        Dim div As HtmlNode = doc.DocumentNode.SelectSingleNode("/html/body/div[1]/div[2]/section/aside/section/div[2]/div/article[1]/div/div/h3/a")
        'select the image'
        Dim img As HtmlNode = doc.DocumentNode.SelectSingleNode("/html/body/div[1]/div[2]/div/div[1]/div[5]/div/div[2]/p[1]/a/img")

        If Not div Is Nothing Then
            TextBox1.Text = div.InnerText.Trim()
        End If

        If Not img Is Nothing Then
            'PictureBox1.Load(img.OuterHtml.Trim())
        End If
        'Test Picturebox2
        PictureBox2.Load("https://cdn2.nextinpact.com/compress/100-76//images/bd/square-linked-media/23647.jpg")

    End Sub

End Class

但在 PictureBox1 中我无法获取图片。

在图2中,仅用于测试。

如何正确获取 Picturebox1 的图片?

【问题讨论】:

  • 您想从该页面获取哪张图片?
  • 在来自 PictureBox2.Load 的示例中

标签: vb.net


【解决方案1】:

如果您尝试提取 PictureBox2 中显示的相同图像,则第二个 SelectSingleNode 上的 XPath 不正确。我会改用这些:

'select the title'
Dim div As HtmlNode = doc.DocumentNode.SelectSingleNode("//aside[@id='sideBarIndex']//article//div/div/h3/a")
'select the image'
Dim img As HtmlNode = doc.DocumentNode.SelectSingleNode("//aside[@id='sideBarIndex']//article//img")

【讨论】:

  • 太棒了。感谢您的大力帮助。
【解决方案2】:

那么简单地从元素中获取 URL。

Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
    Dim doc As HtmlDocument = New HtmlWeb().Load("https://www.nextinpact.com/")
    PictureBox1.LoadAsync(doc.DocumentNode.SelectSingleNode("//aside[@id='sideBarIndex']//img").Attributes("src").Value)
End Sub

像这样。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-06-17
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多