使用 HTML Agility Pack 在 html doc c# 中查找特定链接答案

【问题标题】：Find specific link in html doc c# using HTML Agility Pack使用 HTML Agility Pack 在 html doc c# 中查找特定链接
【发布时间】：2016-11-17 21:07:00
【问题描述】：

我正在尝试解析 HTML 文档以检索页面中的特定链接。我知道这可能不是最好的方法，但我试图通过其内部文本找到我需要的 HTML 节点。但是，在 HTML 中有两个实例会发生这种情况：页脚和导航栏。我需要导航栏中的链接。 HTML 中的“页脚”首先出现。这是我的代码：

    public string findCollegeURL(string catalog, string college)
    {
        //Find college
        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(catalog);
        var root = doc.DocumentNode;
        var htmlNodes = root.DescendantsAndSelf();

        // Search through fetched html nodes for relevant information
        int counter = 0;
        foreach (HtmlNode node in htmlNodes) {
            string linkName = node.InnerText;
            if (linkName == colleges[college] && counter == 0)
            {
                counter++;
                continue;
            }  
            else if(linkName == colleges[college] && counter == 1)
            {
                string targetURL = node.Attributes["href"].Value; //"found it!"; //
                return targetURL;
            }/* */
        }

        return "DID NOT WORK";
    }

程序正在进入 if else 语句，但在尝试检索链接时，我收到 NullReferenceException。 这是为什么呢？如何检索我需要的链接？

这是我试图访问的 HTML 文档中的代码：

    <tr class>
       <td id="acalog-navigation">
           <div class="n2_links" id="gateway-nav-current">...</div>
           <div class="n2_links">...</div>
           <div class="n2_links">...</div>
           <div class="n2_links">...</div>
           <div class="n2_links">...</div>
              <a href="/content.php?catoid=10&navoid=1210" class"navbar" tabindex="119">College of Science</a> ==$0
           </div>

这是我想要的链接：/content.php?catoid=10&navoid=1210

【问题讨论】：

标签： c# html html-parsing html-agility-pack

【解决方案1】：

我发现使用 XPath 比编写大量代码更容易使用

var link = doc.DocumentNode.SelectSingleNode("//a[text()='College of Science']")
              .Attributes["href"].Value;

如果您有 2 个具有相同文本的链接，请选择第 2 个

var link = doc.DocumentNode.SelectSingleNode("(//a[text()='College of Science'])[2]")
              .Attributes["href"].Value;

它的 Linq 版本

var links = doc.DocumentNode.Descendants("a")
               .Where(a => a.InnerText == "College of Science")
               .Select(a => a.Attributes["href"].Value)
               .ToList();

【讨论】：

我得到一个错误。 “HtmlNode”不包含“SelectSingleNode”的定义
@AndreaS。你用的是哪个版本？你的环境是什么？
我使用的是 Visual Studio 2015
@AndreaS。我的意思是你的目标，对于 Windows？ wp10? Xpath 不适用于 windows phone...但我也为它添加了 linqy 方式...