C#过滤字符之间空格数未定义的单词答案

【问题标题】：C# Filter a word with an undefined number of spaces between charachersC#过滤字符之间空格数未定义的单词
【发布时间】：2020-04-23 06:55:13
【问题描述】：

例如：

例如，我可以创建一个包含多个空格的 wordt：

string example = "**example**";
List<string>outputs = new List<string>();
string example_output = "";
foreach(char c in example)
{
   example_putput += c + " ";
}

然后我可以循环它以删除所有空格并将它们添加到输出列表中，存在的问题。我需要它在有双空格等的场景中工作。

例如。

string text = "This is a piece of text for this **example**.";

我基本上想检测并删除'example'

但是，即使它说 e xample、e x ample 或 example，我也想这样做。

在我的场景中，因为它是一个垃圾邮件过滤器，所以我不能像下面那样替换整个句子中的空格，因为我需要 .Replace（使用与用户键入的空格完全相同的单词）。

.Replace(" ", "");

我将如何实现这一目标？

TLDR：我想在不改变行的任何其他部分的情况下过滤掉具有多个空格组合的单词。

So example, e xample, e  x ample, e    x   a  m ple

成为过滤词

我不介意一种方法可以生成一个带有所有空格的单词作为计划 b。

【问题讨论】：

标签： c# filter

【解决方案1】：

您可以使用此正则表达式来实现： (e[\s]*x[\s]*a[\s]*m[\s]*p[\s]*l[\s]*e)

Link

Dotnet Fiddle

【讨论】：

【解决方案2】：

您可以为此使用正则表达式：e\s*x\s*a\s*m\s*p\s*l\s*e \s 表示任何空白字符，* 表示该空白的 0-n 个计数。

小sn-p：

const string myInput = "e x ample";
var regex = new Regex("e\s*x\s*a\s*m\s*p\s*l\s*e");

var match = regex.Match(myInput);
if (match.Success)
{ 
   // We have a match! Bad word
}

这里是正则表达式的链接：https://regex101.com/r/VFjzTg/1

【讨论】：

【解决方案3】：

我发现问题是忽略匹配字符串中的空格，但不要在字符串中的其他任何地方触摸它们。

您可以从匹配词中创建一个正则表达式，允许每个字符之间有任意空格。

    // prepare regex. Need to do this only once for many applications.
    string findword = "example";
    // TODO: would need to escape special chars like * ( ) \ . + ? here.
    string[] tmp = new string[findword.Length];
    for(int i=0;i<tmp.Length;i++)tmp[i]=findword.Substring(i,1);
    System.Text.RegularExpressions.Regex r = new System.Text.RegularExpressions.Regex(string.Join("\\s*",tmp));

    // on each text to filter, do this:
    string inp = "A text with the exa  mple word in it.";
    string outp;
    outp = r.Replace(inp,"");
    System.Console.WriteLine(outp);

为简洁起见，省略了正则表达式特殊字符的转义。

【讨论】：

太棒了，这正是我想要的！你知道我将如何使用这种方法同时忽略口音吗？（所以即使是 ex amplé）会被替换吗？我目前这样做： private static string RemoveAccents(string s) { Encoding destEncoding = Encoding.GetEncoding("iso-8859-8"); return destEncoding.GetString(Encoding.Convert(Encoding.UTF8, destEncoding, Encoding.UTF8.GetBytes(s))); }

【解决方案4】：

你可以试试正则表达式：

using System.Text.RegularExpressions;

....

// Having a word to find
string toFind = "Example";

// we build the regular expression
Regex regex = new Regex(
   @"\b" + string.Join(@"\s*", toFind.Select(c => Regex.Escape(c.ToString()))) + @"\b", 
   RegexOptions.IgnoreCase);

// Then we apply regex built for the required text:
string text = "This is a piece of text for this **example**. And more (e  X amp    le)";

string result = regex.Replace(text, "");

Console.Write(result);

结果：

This is a piece of text for this ****. And more ()

编辑：如果你想忽略变音符号，你应该修改正则表达式：

  string toFind = "Example";

  Regex regex = new Regex(@"\b" + string.Join(@"\s*", 
    toFind.Select(c => Regex.Escape(c.ToString()) + @"\p{Lm}*")), 
    RegexOptions.IgnoreCase);

和Normalize匹配前的文字：

  string text = "This is a piece of text for this **examplé**. And more (e  X amp    le)";

  string result = regex.Replace(text.Normalize(NormalizationForm.FormD), "");

【讨论】：

我将如何将它与忽略重音结合使用？所以它可以代替例子
@thekguy：如果你想忽略变音符号，你应该在匹配/替换之前修改正则表达式和Normalize文本。查看我的编辑