【问题标题】:c# HtmlAgilityPack, How to grab InnerText of all occurences of specific tag?c# HtmlAgilityPack,如何获取特定标签的所有出现的 InnerText?
【发布时间】:2017-11-25 02:20:31
【问题描述】:

正如标题中简要说明的那样,我试图抓取每个标签出现的每个 InnerText 并将其添加到列表中。这是我的代码以及我的 html:

HTML 正文:

<body cz-shortcut-listen="true">
{"draw":1,"recordsTotal":9437,"recordsFiltered":9437,"data":[["
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115739257\&quot;">AK-47 | Aquamarine Revenge (Factory New)&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;"href="\&quot;\/id\/115739257\&quot;">33.87&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115739257\&quot;">34.53&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;https:\/\/track.steamanalyst.com\/730\/115739257\/all\&quot;">25.9&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115739257\&quot;">164&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115739257\&quot;">-0.16&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115739257\&quot;">2.10945&lt;\/a&gt;"],["</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115734122\&quot;">AK-47 | Aquamarine Revenge (Minimal Wear)&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115734122\&quot;">23.44&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115734122\&quot;">21.85&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;https:\/\/track.steamanalyst.com\/730\/115734122\/all\&quot;">17.61&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115734122\&quot;">533&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115734122\&quot;">-2.65&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115734122\&quot;">0.94387&lt;\/a&gt;"],["</a>
</body>

我的代码:

List<string> Data = new List<string>();
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//a[@target]"))
{
    if(j <= 6)
    {
        Data.Add(node.InnerText);
        if (j == 6)
        {
            JsonDB.Add(Data[0], Data[1]);
            Data.Clear();
            j = 0;
        }
        else
        {
            j++;
        }
    }
}

此代码的问题:node.InnerText 显示正文中所有标签的所有 InnerText 的连接字符串!基本上它将它显示为doc.DocumentNode.SelectNodes("//a[@target]") 中的第一个节点:

AK-47 | Aquamarine Revenge (Factory New)","33.8","34.34","25.89","170",
"-1.27","2.03181"],[...

【问题讨论】:

    标签: c# html html-agility-pack innertext selectnodes


    【解决方案1】:

    正文中的所有标签:

    doc.DocumentNode.SelectNodes("//a[@target]"))
    

    文档中的标签:

    doc.DocumentNode.SelectNodes(".//a[@target]"))
    

    【讨论】:

    • @Pang 好吧,这实际上不是问题,但更多的是它将所有内部文本的连接字符串显示为一个节点。基本上是这样的:AK-47 |海蓝宝石复仇 (崭新出厂)","33.8","34.34","25.89"....
    【解决方案2】:

    解决方案:在进入 HTML 之前,必须将其视为 JSON 对象

    JObject jresponse = JObject.Parse(response);
    foreach (JArray row in jresponse["data"])
    {
        List<string> Data = new List<string>();
        foreach (JToken entry in row)
        {
            doc.LoadHtml(entry.ToString());
            HtmlNode node = doc.DocumentNode.SelectSingleNode("//a[@target]");
            Data.Add(node.InnerText);
        }
    }
    

    【讨论】:

      猜你喜欢
      • 2013-03-05
      • 1970-01-01
      • 2017-06-17
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多