【问题标题】:Html doesn't get updated with Html Agility PackHtml 没有使用 Html Agility Pack 更新
【发布时间】:2015-10-17 09:31:24
【问题描述】:

我正在尝试从一段 html 中删除 img 和 map 元素。

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);

var oldHtml = doc.DocumentNode.InnerHtml;

if (doc.DocumentNode.SelectNodes("//img[@usemap]") != null)
{
    HtmlNode img = doc.DocumentNode.SelectSingleNode("//img[@usemap]");
    img.ParentNode.RemoveChild(img);
}

if (doc.DocumentNode.SelectNodes("//map") != null)
{
    HtmlNode map = doc.DocumentNode.SelectSingleNode("//map");
    map.ParentNode.RemoveChild(map);
}

var newHtml = doc.DocumentNode.InnerHtml;

newHtml 仍然包含 img 和 map 元素。在更新 html 之前我需要做其他事情吗?

这是我要删除的 html:

<p><img src="/media/8301/HD00_498x299.jpg"  width="498"  height="299" alt="HD00.JPG" usemap="#imgmap201392714219"/><br />
<br />
 <a title="Download ZIP DWG"
href="/media/8103/detailtekeningen-dwg-unidek-aero.zip"
target="_blank">Klik hier om alle DWG&nbsp;bestanden in
een&nbsp;zipfile te downloaden.</a><br />
 <a title="Download DXF"
href="/media/8104/detailtekeningen-dxf-unidek-aero.zip"
target="_blank">Klik hier om alle DXF bestanden in een zipfile te
downloaden.</a><br />
 <a title="Download PDF"
href="/media/8116/detailtekeningen-pdf-unidek-aero.zip"
target="_blank">Klik hier om alle PDF bestanden in een zipfile te
downloaden.</a><br />
<br />
 <strong><a title="Bouwdetails berekende psi-waarden"
href="/{localLink:8014}" target="_blank">Link naar de technische
bouwdetails met verbeterde eigen ψ-waarden<br />
</a></strong> &nbsp;<map name="imgmap2012104102243"
id="imgmap2012104102243">
<area title="" href="/nl/producten/hellend-dak/unidek-aero/1"
shape="rect" coords="194,419,219,439" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/2"
shape="rect" coords="221,420,246,439" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/3"
shape="rect" coords="200,302,226,320" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/4"
shape="rect" coords="209,167,234,185" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/6"
shape="rect" coords="68,46,98,67" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/7"
shape="rect" coords="102,203,129,224" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/8"
shape="rect" coords="273,339,302,360" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/9"
shape="rect" coords="387,350,417,372" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/10"
shape="rect" coords="324,341,354,363" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/11"
shape="rect" coords="223,369,252,390" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/12"
shape="rect" coords="62,270,89,294" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/13"
shape="rect" coords="93,270,119,294" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/14"
shape="rect" coords="31,94,60,114" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/14"
shape="rect" coords="79,161,106,182" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/15"
shape="rect" coords="19,150,50,171" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/15"
shape="rect" coords="82,113,110,134" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/16"
shape="rect" coords="176,231,205,253" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/17"
shape="rect" coords="147,179,176,200" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/18"
shape="rect" coords="139,235,166,257" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/19"
shape="rect" coords="204,56,231,78" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/20"
shape="rect" coords="125,135,153,157" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/21"
shape="rect" coords="265,263,290,284" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/23"
shape="rect" coords="9,202,36,225" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/24"
shape="rect" coords="39,202,65,225" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/25"
shape="rect" coords="158,80,184,101" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/26"
shape="rect" coords="188,80,213,102" target="_blank" alt="" />
</map><map id="imgmap201392714219">
<area title="" href="/nl/producten/hellend-dak/unidek-aero/1"
shape="rect" coords="265,463,279,480" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/2"
shape="rect" coords="282,466,297,480" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/3"
shape="rect" coords="213,339,237,358" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/4"
shape="rect" coords="206,204,227,220" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/6"
shape="rect" coords="113,105,135,121" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/7"
shape="rect" coords="134,246,154,262" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/8"
shape="rect" coords="299,369,319,386" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/9"
shape="rect" coords="432,409,453,425" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/10"
shape="rect" coords="363,394,385,413" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/11"
shape="rect" coords="254,406,276,422" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/12"
shape="rect" coords="105,298,122,314" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/13"
shape="rect" coords="122,298,139,314" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/14"
shape="rect" coords="53,121,77,139" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/15"
shape="rect" coords="49,165,72,182" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/16"
shape="rect" coords="195,272,214,288" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/17"
shape="rect" coords="152,212,175,230" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/18"
shape="rect" coords="160,276,180,293" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/19"
shape="rect" coords="234,88,255,105" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/20"
shape="rect" coords="132,155,158,174" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/21"
shape="rect" coords="299,294,321,311" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/23"
shape="rect" coords="40,234,55,250" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/24"
shape="rect" coords="56,233,73,251" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/25"
shape="rect" coords="185,108,202,127" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/26"
shape="rect" coords="203,109,219,127" target="_blank" alt="" />
</map></p>

当我调试时找到了 img 和 map 元素,但调用 RemoveChild 根本不会更改 html。此外,当我尝试更改属性或其他内容时,什么也没有发生。

【问题讨论】:

    标签: c# html html-agility-pack


    【解决方案1】:

    这对我有用:

    var doc = new HtmlDocument();
    doc.LoadHtml(html);
    
    var root = doc.DocumentNode;
    if (root != null)
    {
        var replace = false;
    
        images = root.SelectNodes("//img[@usemap]");
        if (images != null)
        {
            foreach (var image in images)
            {
                image.ParentNode.RemoveChild(image);
            }
    
            replace = true;
        }
    
        if (replace)
        {
            html = root.OuterHtml;
        }
    }
    
    var newhtml = html;
    

    图像从 html 中删除。

    【讨论】:

      【解决方案2】:

      我刚刚发现 HTML Agility 包的错误是您只能请求.InnerHtml 一次。之后,它将不会更新。你要了两次:

      HtmlDocument doc = new HtmlDocument();
      doc.LoadHtml(html);
      
      var oldHtml = doc.DocumentNode.InnerHtml;
      
      if (doc.DocumentNode.SelectNodes("//img[@usemap]") != null)
      {
          HtmlNode img = doc.DocumentNode.SelectSingleNode("//img[@usemap]");
          img.ParentNode.RemoveChild(img);
      }
      
      if (doc.DocumentNode.SelectNodes("//map") != null)
      {
          HtmlNode map = doc.DocumentNode.SelectSingleNode("//map");
          map.ParentNode.RemoveChild(map);
      }
      
      var newHtml = doc.DocumentNode.InnerHtml;
      

      如果你去掉这条线:

      var oldHtml = doc.DocumentNode.InnerHtml;
      

      它应该工作。这似乎是 HtmlAgilityPack 的一个随机错误。

      Sniffdk 的解决方案有效,因为他只得到一次.OuterHtml。 HtmlUtilityPack 的人需要解决这个问题。

      【讨论】:

      • PS- Sniffdk 下面的解决方案有效,因为他只获得 .OuterHtml 一次。 HtmlUtilityPack 家伙需要解决这个问题:)
      • 我遇到了与HtmlAgilityPack 1.4.6 类似的问题(对 InnerHtml 的分配未反映在 OuterHtml 的调用中),升级到 HtmlAgilityPack 1.4.9.5 后该问题消失了。也许这个版本也能解决你的问题?
      【解决方案3】:

      到目前为止,我需要在 html 敏捷包工作之前在 Umbraco 中执行此操作:

      var documents = Document.GetDocumentsOfDocumentType(5125);
      var document = documents.Where(x => x.Id == 5127).First();
      
      var html = document.getProperty("content").Value.ToString();
      html = html.Replace("\r\n", "");
      html = umbraco.library.RemoveFirstParagraphTag(html);
      
      HtmlDocument doc = new HtmlDocument();
      doc.LoadHtml(html);
      

      【讨论】:

        【解决方案4】:

        似乎 HtmlAgilityPack 在删除节点后没有更新HtmlDocument.DocumentNode.InnerHtml 属性。最简单的解决方法是使用 OuterHtml 属性而不是 InnerHtml

        var newHtml = doc.DocumentNode.OuterHtml;
        

        到目前为止,我一直使用OuterHtml 属性来检查我所做的更改是否产生了预期的结果,现在才意识到InnerHtml 的这种行为。

        更新:

        在发布的 HTML 示例中,您有 2 个 &lt;map&gt; 元素。您的代码仅删除一个。尝试这种方式删除所有&lt;img&gt;&lt;map&gt; 节点:

        if (doc.DocumentNode.SelectNodes("//img[@usemap]") != null)
        {
            HtmlNodeCollection imgs = doc.DocumentNode.SelectNodes("//img[@usemap]");
            foreach (HtmlNode img in imgs)
            {
                img.ParentNode.RemoveChild(img);
            }
        }
        
        if (doc.DocumentNode.SelectNodes("//map") != null)
        {
            HtmlNodeCollection maps = doc.DocumentNode.SelectNodes("//map");
            foreach (HtmlNode map in maps)
            {
                map.ParentNode.RemoveChild(map);
            }
        }
        var newHtml = doc.DocumentNode.OuterHtml;
        

        [.NET Fiddle demo]

        【讨论】:

        • 我试过OuterHtml,但结果是一样的。 html 仍未更新。
        • 更新了我的答案,针对您的 HTML 示例(包装在 &lt;html&gt;&lt;/html&gt; 中)测试了代码,并且在这里对我来说很好,newHtml 最后只包含 &lt;html&gt;&lt;/html&gt;
        • 看起来 html 有问题。此示例有效:gist.github.com/abjerner/35c8d8b2ce16c307cfee,但我从 Umbraco RTE 获得的 html 不起作用。可能是因为它在 html 中有 localLink。
        • 不知道为什么你认为本地链接与这个问题有任何关系。无论如何,如果不能重现问题,就很难进一步诊断
        • 如果您使用我在问题中发布的更新后的 html,您应该能够重现该问题,因为该 html 不起作用。
        猜你喜欢
        • 1970-01-01
        • 2012-03-07
        • 1970-01-01
        • 2011-06-04
        • 2014-07-13
        • 2014-08-31
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多