【问题标题】:Get the difference between 2 strings获取2个字符串之间的差异
【发布时间】:2014-12-02 16:52:44
【问题描述】:

我正在尝试计算两个字符串之间的差异

例如

string val1 = "Have a good day";
string val2 = "Have a very good day, Joe";

结果将是一个字符串列表,其中包含 2 项“very”和“, Joe”

到目前为止,我对这项任务的研究还没有太多

编辑:结果可能需要 2 个单独的字符串列表,一个包含添加,一个包含删除

【问题讨论】:

  • “你的研究”是什么意思?你写了一些代码吗?然后分享一下。你什么都没写吗?在这种情况下,我认为您只期望我们为您编写代码是不公平的。努力吧!
  • 我已经编写了代码并进行了研究。我的代码甚至没有按预期工作
  • 就像我说的那样分享您的代码,这样我们就可以为您指明正确的方向。
  • 这是一项不平凡的任务。有关 DIFF 库的使用,请参阅 here

标签: c# .net string comparison


【解决方案1】:

这是我能想到的最简单的版本:

class Program
{
    static void Main(string[] args)
    {
        string val1 = "Have a good day";
        string val2 = "Have a very good day, Joe";

        MatchCollection words1 = Regex.Matches(val1, @"\b(\w+)\b");
        MatchCollection words2 = Regex.Matches(val2, @"\b(\w+)\b");

        var hs1 = new HashSet<string>(words1.Cast<Match>().Select(m => m.Value));
        var hs2 = new HashSet<string>(words2.Cast<Match>().Select(m => m.Value));

        // Optionaly you can use a custom comparer for the words.
        // var hs2 = new HashSet<string>(words2.Cast<Match>().Select(m => m.Value), new MyComparer());

        // h2 contains after this operation only 'very' and 'Joe'
        hs2.ExceptWith(hs1); 

    }
}

custom comparer:

public class MyComparer : IEqualityComparer<string>
{
    public bool Equals(string one, string two)
    {
        return one.Equals(two, StringComparison.OrdinalIgnoreCase);
    }

    public int GetHashCode(string item)
    {
        return item.GetHashCode();
    }
}

【讨论】:

    【解决方案2】:

    其实我是按照这个步骤的,

    (i)Obtain all words 来自两个单词,不考虑特殊字符

    (ii)从两个列表中找出区别

    代码:

        string s2 = "Have a very good day, Joe";
        IEnumerable<string> diff;
        MatchCollection matches = Regex.Matches(s1, @"\b[\w']*\b");
        IEnumerable<string> first= from m in matches.Cast<Match>()
                    where !string.IsNullOrEmpty(m.Value)
                    select TrimSuffix(m.Value);
        MatchCollection matches1 = Regex.Matches(s2, @"\b[\w']*\b");
        IEnumerable<string> second = from m in matches1.Cast<Match>()
                                     where !string.IsNullOrEmpty(m.Value)
                                     select TrimSuffix(m.Value);
    
        if (second.Count() > first.Count())
        {
            diff = second.Except(first).ToList();
        }
        else
        {
            diff = first.Except(second).ToList();
        }
        }
       static string TrimSuffix(string word)
       {
        int apostropheLocation = word.IndexOf('\'');
        if (apostropheLocation != -1)
        {
            word = word.Substring(0, apostropheLocation);
        }
        return word;
       }
    

    输出: 非常,乔

    【讨论】:

    • 您的代码与 OP 的预期结果不符。
    • @ErikPhilips 我修改了答案
    【解决方案3】:

    这段代码:

    enum Where { None, First, Second, Both } // somewhere in your source file
    
    //...
    var val1 = "Have a good calm day calm calm calm";
    var val2 = "Have a very good day, Joe Joe Joe Joe";
    
    var words1 = from m in Regex.Matches(val1, "(\\w+)|(\\S+\\s+\\S+)").Cast<Match>()
                    where m.Success
                    select m.Value.ToLower();
    var words2 = from m in Regex.Matches(val2, "(\\w+)|(\\S+\\s+\\S+)").Cast<Match>()
                    where m.Success
                    select m.Value.ToLower();
    
    var dic = new Dictionary<string, Where>();
    foreach (var s in words1)
    {
        dic[s] = Where.First;
    }
    foreach (var s in words2)
    {
        Where b;
        if (!dic.TryGetValue(s, out b)) b = Where.None;
    
        switch (b)
        {
            case Where.None:
                dic[s] = Where.Second;
                break;
            case Where.First:
                dic[s] = Where.Both;
                break;
        }
    }
    
    foreach (var kv in dic.Where(x => x.Value != Where.Both))
    {
        Console.WriteLine(kv.Key);
    }
    

    给我们 'calm'、'very'、', Joe' 和 'Joe' 这两个字符串的区别;第一个中的“平静”,下一个中的“非常”、“乔”和“乔”。它还会删除重复的案例。

    并获得两个单独的列表,告诉我们哪个单词来自哪个文本:

    var list1 = dic.Where(x => x.Value == Where.First).ToList();
    var list2 = dic.Where(x => x.Value == Where.Second).ToList();
    
    foreach (var kv in list1)
    {
        Console.WriteLine("{0}: {1}", kv.Key, kv.Value);
    }
    
    foreach (var kv in list2)
    {
        Console.WriteLine("{0}: {1}", kv.Key, kv.Value);
    }
    

    【讨论】:

      【解决方案4】:

      将字符分成两组,然后计算这些组的相对互补。

      相对恭维将在任何好的集合库中可用。

      您可能需要注意保持字符的顺序。

      【讨论】:

        【解决方案5】:

        您必须删除“,”才能获得预期的结果

          string s1 = "Have a good day";
                string s2 = "Have a very good day, Joe";
                int index = s2.IndexOf(','); <----- get the index of the char to be removed
                IEnumerable<string> diff;
                IEnumerable<string> first = s1.Split(' ').Distinct();
                IEnumerable<string> second = s2.Remove(index, 1).Split(' ').Distinct();<--- remove it
                if (second.Count() > first.Count())
                {
                    diff = second.Except(first).ToList();
                }
                else
                {
                    diff = first.Except(second).ToList();
                }
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2015-02-18
          • 2020-08-12
          • 2012-03-10
          相关资源
          最近更新 更多