【问题标题】:Can't select element with multiple attributes using XPath无法使用 XPath 选择具有多个属性的元素
【发布时间】:2017-05-02 14:03:56
【问题描述】:

试图解析 news.google

<a target="_blank"class="article usg-AFQjCNFr5aujpYnTzdHNYfHZw_gNN6iq-w sig2-1esugE2Sy8Bhe2CzulGmsA did--5114870031117960448 esc-thumbnail-link" href="http://www.theglobeandmail.com/news/world/trump-blasts-media-in-rally-celebrating-100-days-as-president/article34858356/" url="http://www.theglobeandmail.com/news/world/trump-blasts-media-in-rally-celebrating-100-days-as-president/article34858356/" id="MAA4AEgAUABgAWoCY2E"  ssid="h" >

我想要 url 属性。我无法获取 url 属性。我得到的只是空引用。

XPath 找到这个多属性元素:

HtmlNode aNodes = doc.DocumentNode.SelectSingleNode("//a[@target='_blank' and @class='article usg-AFQjCNFr5aujpYnTzdHNYfHZw_gNN6iq-w sig2-1esugE2Sy8Bhe2CzulGmsA did--5114870031117960448 esc-thumbnail-link' and @href='http://www.theglobeandmail.com/news/world/trump-blasts-media-in-rally-celebrating-100-days-as-president/article34858356/' and @url='http://www.theglobeandmail.com/news/world/trump-blasts-media-in-rally-celebrating-100-days-as-president/article34858356/' and @id='MAA4AEgAUABgAWoCY2E' and @ssid='h']");

我只是在试图找到这个元素时得到一个空引用。 url 和 href 等属性值总是在变化。有没有办法根据元素中的属性而不是属性值来获取 url?如果一个元素具有这五个属性然后选择 url 的值?非常感谢。

【问题讨论】:

    标签: c# html xpath html-agility-pack


    【解决方案1】:

    是的,可以通过存在属性而不是特定属性来选择元素:

    测试 HTML:

    var html = @"
    <!-- match -->
    <a target='_blank'class='article usg-AFQjCNFr5aujpYnTzdHNYfHZw_gNN6iq-w sig2-1esugE2Sy8Bhe2CzulGmsA did--5114870031117960448 esc-thumbnail-link' href='http://www.theglobeandmail.com/news/world/trump-blasts-media-in-rally-celebrating-100-days-as-president/article34858356/' url='http://www.theglobeandmail.com/news/world/trump-blasts-media-in-rally-celebrating-100-days-as-president/article34858356/' id='MAA4AEgAUABgAWoCY2E'  ssid='h' ></a>
    <!-- NO match, missing url -->
    <a target='_blank' href='NO MATCH'' ssid='' id='' class=''></a>
    <!-- match -->
    <a target='_blank' href='#' ssid='' id='' class='' url='MATCH'><a/>
    <!-- NO match, missing multiple wanted attributes -->
    <a target='_blank' href='#' url='NO MATCH'></a>
    ";
    

    还有一点 LINQ:

    HtmlDocument document = new HtmlDocument();
    document.LoadHtml(html);
    var wantedLinks = from a in document.DocumentNode.SelectNodes("//a")
        where a.Attributes["url"] != null
        && a.Attributes["ssid"] != null
        && a.Attributes["href"] != null
        && a.Attributes["id"] != null
        && a.Attributes["class"] != null
        && a.Attributes["target"] != null
        select a;
    
    foreach (var a in wantedLinks)
    {
        Console.WriteLine(a.Attributes["url"].Value);
    }
    

    输出 - 注意缺少所有六个属性的链接被跳过:

    http://www.theglobeandmail.com/news/world/trump-blasts-media-in-rally-celebrating-100-days-as-president/article34858356/
    MATCH
    

    【讨论】:

      猜你喜欢
      • 2011-12-14
      • 2012-11-30
      • 1970-01-01
      • 1970-01-01
      • 2010-11-03
      • 2011-08-05
      • 2016-01-14
      • 1970-01-01
      • 2020-08-06
      相关资源
      最近更新 更多