xpath 和 htmlagility 包答案

【问题标题】：xpath and htmlagility packxpath 和 htmlagility 包
【发布时间】：2011-07-29 14:32:36
【问题描述】：

我想通了！我会留下这个帖子，以防其他像我这样的新手有同样的问题。

答案： **("./td[2]/span[@class='smallfont']")***

我是 xpath 和 html 敏捷性的新手。我如此接近却又如此遥远。

目标：退出上午 4:30

通过将以下内容与 htmlagility 包一起使用：

foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table[@id='weekdays']/tr[2]")){
string time = table.SelectSingleNode("./td[2]").InnerText;

当我尝试用我得到 xpath 异常的跨度。 我必须在 ("./td[2]") 中添加什么才能以凌晨 4:30 结束？

HTML
<td class="alt1 espace" nowrap="nowrap" style="text-align: center;">
<span class="smallfont">4:30am</span>
</td>

【问题讨论】：

您可以回答自己的问题。在同一个问题中这样做会使它成为一个不真实的问题。

标签： c# xpath screen-scraping html-parsing html-agility-pack

【解决方案1】：

我不知道是否可以选择 Linq，但您也可以这样做：

        var time = string.Empty;
        var html =
            "<td class=\"alt1 espace\" nowrap=\"nowrap\" style=\"text-align: center;\"><span class=\"smallfont\">4:30am</span></td>";

        var document = new HtmlDocument() { OptionWriteEmptyNodes = true, OptionOutputAsXml = true };

        document.LoadHtml(html);

        var timeSpan =
            document.DocumentNode.Descendants("span").Where(
                n => n.Attributes["class"] != null && n.Attributes["class"].Value == "smallfont").FirstOrDefault();

        if (timeSpan != null)
            time = timeSpan.InnerHtml;

【讨论】：

这真的很酷。您是否使用流式阅读器从 url 中提取 html？作为编程新手，我喜欢学习新事物。