【问题标题】:HTMLAgility pack C# unclosed colgroup tagHTMLAgility 包 C# 未封闭的 colgroup 标记
【发布时间】:2017-11-10 02:29:55
【问题描述】:

我将一个字符串 (HTML) 发布到服务器端,然后使用 HTMLAgility 包对其进行验证。在 HTML 中有一个未闭合的 colgroup 标记。

清理后,关闭 colgroup 标记出现,但位于关闭“tbody”和“table”标记之间

之前:

<table width="3265" class="mce-item-table" style="width: 2452pt; border-collapse: collapse;" border="0" cellspacing="0" cellpadding="0">

 <colgroup><col width="80" style="width: 60pt;">
 <col width="245" style="width: 184pt;" span="13"> <!-- MISSING COLGROUP tag-->
 <tbody><tr height="20" style="height: 15pt;">
  <td width="80" height="20" style="width: 60pt; height: 15pt; color: blue; text-decoration: underline; text-underline-style: single;"><span style="color: blue;">31109173</span></td>
  <td width="245" style="width: 184pt; font-family: Arial; font-size: 9pt;">31109173</td>
  <td width="245" align="right" style="width: 184pt; font-family: Arial; font-size: 9pt;">May 09,2017 9:54 AM</td>
  <td width="245" align="right" style="width: 184pt; font-family: Arial; font-size: 9pt;">May 08,2017 5:21 PM</td>
 </tr>
 <tr height="20" style="height: 15pt;">
  <td height="20" style="height: 15pt; color: blue; text-decoration: underline; text-underline-style: single;"><span style="color: blue;">30933775</span></td>
  <td style="font-family: Arial; font-size: 9pt;">30933775</td>
  <td align="right" style="font-family: Arial; font-size: 9pt;">May 09,2017 9:50 AM</td>
  <td align="right" style="font-family: Arial; font-size: 9pt;">Apr 28,2017 6:22 PM</td>
 </tr>
</tbody></table>

之后:

<table width="3265" class="mce-item-table" style="width: 2452pt; border-collapse: collapse;" border="0" cellspacing="0" cellpadding="0">

 <colgroup><col width="80" style="width: 60pt;">
 <col width="245" style="width: 184pt;" span="13">
 <tbody><tr height="20" style="height: 15pt;">
  <td width="80" height="20" style="width: 60pt; height: 15pt; color: blue; text-decoration: underline; text-underline-style: single;"><span style="color: blue;">31109173</span></td>
  <td width="245" style="width: 184pt; font-family: Arial; font-size: 9pt;">31109173</td>
  <td width="245" align="right" style="width: 184pt; font-family: Arial; font-size: 9pt;">May 09,2017 9:54 AM</td>
  <td width="245" align="right" style="width: 184pt; font-family: Arial; font-size: 9pt;">May 08,2017 5:21 PM</td>
 </tr>
 <tr height="20" style="height: 15pt;">
  <td height="20" style="height: 15pt; color: blue; text-decoration: underline; text-underline-style: single;"><span style="color: blue;">30933775</span></td>
  <td style="font-family: Arial; font-size: 9pt;">30933775</td>
  <td align="right" style="font-family: Arial; font-size: 9pt;">May 09,2017 9:50 AM</td>
  <td align="right" style="font-family: Arial; font-size: 9pt;">Apr 28,2017 6:22 PM</td>
 </tr>
</tbody></colgroup></table>

<!-- ^^ </colgroup> has appeared above-->

我尝试将“OptionFixNestedTags”标志设置为 true。我仍然得到相同的结果。

【问题讨论】:

    标签: c# html-agility-pack html-sanitizing


    【解决方案1】:

    我尝试了 HTMLAgility 包中的各种选项并将它们设置为 true。这不起作用。

    OptionFixNestedTags = true;
    OptionAutoCloseOnEnd = true;
    

    有一个很好的 Nuget 包可以清理 html。我遇到的问题在这里解决了 -> HtmlSanitizer

    希望这会有所帮助。

    【讨论】:

      猜你喜欢
      • 2013-02-09
      • 2019-12-24
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2011-07-29
      相关资源
      最近更新 更多