使用 HTMLAgilityPack 从节点的子节点中选择所有 <p>答案

【问题标题】：Select all <p>'s from a Node's children using HTMLAgilityPack使用 HTMLAgilityPack 从节点的子节点中选择所有 <p>
【发布时间】：2010-01-21 17:24:04
【问题描述】：

我有以下用于获取 html 页面的代码。将网址设为绝对网址，然后将链接设为 rel nofollow 并在新窗口/选项卡中打开。我的问题是向<a>s 添加属性。

        string url = "http://www.mysite.com/";
        string strResult = "";            

        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
        HttpWebResponse response = (HttpWebResponse)request.GetResponse();

        if ((request.HaveResponse) && (response.StatusCode == HttpStatusCode.OK)) {
            using (StreamReader sr = new StreamReader(response.GetResponseStream())) {
                strResult = sr.ReadToEnd();
                sr.Close();
            }
        }

        HtmlDocument ContentHTML = new HtmlDocument();
        ContentHTML.LoadHtml(strResult);
        HtmlNode ContentNode = ContentHTML.GetElementbyId("content");

        foreach (HtmlNode node in ContentNode.SelectNodes("/a")) {
            node.Attributes.Append("rel", "nofollow");
            node.Attributes.Append("target", "_blank");
        }

        return ContentNode.WriteTo();

谁能看到我做错了什么？在这里尝试了一段时间没有运气。此代码显示 ContentNode.SelectNodes("/a") 未设置为对象的实例。我想尝试将蒸汽设置为 0？

干杯，丹尼斯

【问题讨论】：

标签： c# screen-scraping html-agility-pack

【解决方案1】：

ContentNode 是否为空？您可能需要使用查询 "//*[@id='content']" 选择单项。

关于信息，"/a" 表示所有锚点在根部。 "descendant::a" 工作吗？还有HtmlElement.GetElementsByTagName，这可能更容易 - 即yourElement.GetElementsByTagName("a")。

【讨论】：

这一切都来到了 XPath/XSL，对我来说是一个排序的 iut！谢谢。我知道 / 是根，但没有注意到。谢谢