【发布时间】:2010-12-21 15:20:10
【问题描述】:
我已尽力通过代码添加 cmets,但我有点卡在某些部分。
// create a new instance of the HtmlDocument Class called doc
1: HtmlDocument doc = new HtmlDocument();
// the Load method is called here to load the variable result which is html
// formatted into a string in a previous code snippet
2: doc.Load(new StringReader(result));
// a new variable called root with datatype HtmlNode is created here.
// Im not sure what doc.DocumentNode refers to?
3: HtmlNode root = doc.DocumentNode;
4:
// a list is getting constructed here. I haven't had much experience
// with constructing lists yet
5: List<string> anchorTags = new List<string>();
6:
// a foreach loop is used to loop through the html document to
// extract html with 'a' attributes I think..
7: foreach (HtmlNode link in root.SelectNodes("//a"))
8: {
// dont really know whats going on here
9: string att = link.OuterHtml;
// dont really know whats going on here too
10: anchorTags.Add(att)
11: }
我已从here 提取此代码示例。感谢 Farooq Kaiser
【问题讨论】:
-
我从未使用过该库,我只是在这里暗中尝试,但我假设 doc.DocumentNode 是文档的当前节点,加载后该文档将是根节点。
标签: c# html-agility-pack web-scraping