如何从页面上的每个表单元素中去除一个共同属性？答案

【问题标题】：How to strip out one common attribute from every form element on the page?如何从页面上的每个表单元素中去除一个共同属性？
【发布时间】：2011-08-22 17:37:57
【问题描述】：

我有一个包含 HTML 页面响应的字符串变量。它包含数百个标签，包括以下三个html标签：

<tag1 prefix1314030136543="2">
<tag2 prefix131403013654="1" anotherAttribute="432">
<tag3 prefix13140301376543="4">

无论标签名称如何，我都需要能够删除以“前缀”开头的任何属性及其值。最后，我想要：

<tag1>
<tag2 anotherAttribute="432">
<tag3>

我正在使用 C#。我假设 RegEx 是解决方案，但我对 RegEx 感到很糟糕，希望有人能在这里帮助我。

【问题讨论】：

标签： c# .net html regex

【解决方案1】：

看Html Agility Pack。

使用正则表达式：

(?<=<[^<>]*)\sprefix\w+="[^"]"\s?(?=[^<>]*>)

var result = Regex.Replace(s, 
    @"(?<=<[^<>]*)\sprefix\w+=""[^""]""(?=[^<>]*>)", string.Empty);

【讨论】：

你的正则表达式不匹配，前缀总是前面有一个空格 \s

【解决方案2】：

RegEx 不是解决方案，因为 HTML 不是常规语言，因此不应使用 RegEx 解析。我听说过 HTML Agility Pack 用于解析和使用 HTML 的好消息。看看吧。

【讨论】：

【解决方案3】：

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(/* your html here */);
foreach (var item in doc.DocumentNode.Descendants()) {
    foreach (var attr in item.Attributes.Where(x =>x.Name.StartsWith("prefix")).ToArray()) {
        item.Attributes.Remove(attr);
    }
}

【讨论】：

【解决方案4】：

html = Regex.Replace(html, @"(?<=<\w+\s[^>]*)\s" + Regex.Escape(prefix) + @"\w+\s?=\s?""[^""]*""(?=[^>]*>)", "");

你往后看，往前看会发现，然后你有一个前缀匹配器#####="?????"。

【讨论】：

【解决方案5】：

这里是重手的方法。

    String str = "<tag1 prefix131403013654=\"2\">"; 
            while (str.IndexOf("prefix131403013654=\"") != -1) //At least one still exists...
            {
               int point = str.IndexOf("prefix131403013654=\"");
               int length = "prefix131403013654=\"".Length;

               //need to grab last part now. We know there's a leading double quote and a ending double quote surrounding it, so we find the second quote.
               int secondQuote = str.IndexOf("\"",point + length); //second part is your position
               if (str.Substring(point - 1, 1) == " ")
               {
                  str = str.Replace(str.Substring(point, (secondQuote - point + 1)),"");
               }
            }

为更好的代码而编辑。测试后再次编辑，添加+1替换以计算最终报价。有用。基本上，您可以将其包含在一个循环中，该循环遍历一个包含所有“删除这些”值的数组列表。

如果您不知道完整前缀的名称，您可以像这样更改它：

 String str = "<tag1 prefix131403013654=\"2\">"; 
            while (str.IndexOf("prefix") != -1) //At least one still exists...
            {
               int point = str.IndexOf("prefix");

               int firstQuote = str.IndexOf("\"", point);

               int length = firstQuote - point + 1;
               //need to grab last part now. We know there's a leading double quote and a ending double quote surrounding it, so we find the second quote.
               int secondQuote = str.IndexOf("\"",point + length); //second part is your position
               if (str.Substring(point - 1, 1) == " ") //checking if its actually a prefix
               {
                   str = str.Replace(str.Substring(point, (secondQuote - point + 1)),"");
               }
               //Like I said, a very heavy way of doing it.
            }

这将捕获所有以前缀开头的。

【讨论】：