【问题标题】:How to parse tables without id on HTML using HtmlAgilityPack如何使用 HtmlAgilityPack 在 HTML 上解析没有 id 的表格
【发布时间】:2017-03-16 15:59:11
【问题描述】:

我在获取 HTML 中表格的值时遇到问题,因为它没有 ids。我需要获取第二列上的所有值并将它们保存到一个数组中。我正在使用 HtmlAgilityPack,在选择节点时出现问题:

Dim doc As HtmlDocument
Dim web As New HtmlWeb()
Dim str As String

doc = Web.Load("http://www.dietas.net/tablas-y-calculadoras/tabla-de-composicion-nutricional-de-los-alimentos/carnes-y-derivados/aves/pechuga-de-pollo.html#")

Dim nodes_filas As HtmlNode() = doc.DocumentNode.SelectNodes("//table[@id='']//tr").ToArray
Dim nodes_columnas As HtmlNode() = doc.DocumentNode.SelectNodes("//td").ToArray

For Each row As HtmlNode In nodes_filas
    For Each column As HtmlNode In nodes_columnas
        str = column.InnerHtml & vbCrLf
    Next
Next

这是桌子:

<table cellspacing="1" cellpadding="3" width="100%" border="0">
  <tr>
    <td colspan="2" style="font-size:13px;color:#55711C;padding-bottom:5px;">Aporte por raci&oacute;n</td>
  </tr>
  <tr style="background-color:#EBEBEB">
    <td width="125">Energ&iacute;a [Kcal]</td>
    <td class="td_right">145,00</td>
  </tr>
  <tr>
    <td>Prote&iacute;na [g]</td>
    <td class="td_right">22,20</td>
  </tr>
  <tr style="background-color:#EBEBEB">
    <td>Hidratos carbono [g]</td>
    <td class="td_right">0,00</td>
  </tr>
  <tr>
    <td>Fibra [g]</td>
    <td class="td_right">0,00</td>
  </tr>
  <tr style="background-color:#EBEBEB">
    <td>Grasa total [g]</td>
    <td class="td_right">6,20</td>
  </tr>
  <tr>
    <td>AGS [g]</td>
    <td class="td_right">1,91</td>
  </tr>
  <tr style="background-color:#EBEBEB">
    <td>AGM [g]</td>
    <td class="td_right">1,92</td>
  </tr>
  <tr>
    <td>AGP [g]</td>
    <td class="td_right">1,52</td>
  </tr>
  <tr style="background-color:#EBEBEB">
    <td>AGP /AGS</td>
    <td class="td_right">0,79</td>
  </tr>
  <tr>
    <td>(AGP + AGM) / AGS</td>
    <td class="td_right"> 1,80</td>
  </tr>
  <tr style="background-color:#EBEBEB">
    <td>Colesterol [mg]</td>
    <td class="td_right">62,00</td>
  </tr>
  <tr>
    <td>Alcohol [g]</td>
    <td class="td_right">0,00</td>
  </tr>
  <tr style="background-color:#EBEBEB">
    <td>Agua [g]</td>
    <td class="td_right">71,60</td>
  </tr>
</table>

【问题讨论】:

    标签: html vb.net html-agility-pack


    【解决方案1】:

    对不起,我没有安装 VB,但 C# 版本应该足以给你一个想法。你有 td_right 类,你可以使用 lambda 或 xpath 来查询它。 我更喜欢 lambda/linq 版本,因为我熟悉 linq,不需要记住 XPATH 语法。

    拉姆达:

        public static bool HasClass(this HtmlNode node, params string[] classValueArray)
        {
            var classValue = node.GetAttributeValue("class", "");
            var classValues = classValue.Split(' ');
            return classValueArray.All(c => classValues.Contains(c));
        }
    
    var url = "http://www.dietas.net/tablas-y-calculadoras/tabla-de-composicion-nutricional-de-los-alimentos/carnes-y-derivados/aves/pechuga-de-pollo.html#";
            var htmlWeb = new HtmlWeb();
            var htmlDoc = htmlWeb.Load(url);
            var nodes = htmlDoc.DocumentNode.Descendants("td").Where(_ => _.HasClass("td_right")).Select(_ => _.InnerText);
    

    XPATH:

    var nodes2 = htmlDoc.DocumentNode.SelectNodes("//td[@class='td_right']");
    

    【讨论】:

    • 完美解决方案!!非常感谢!
    猜你喜欢
    • 2016-11-20
    • 1970-01-01
    • 2019-01-26
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多