【问题标题】:How to calculate matching score between two string in java?如何计算java中两个字符串之间的匹配分数?
【发布时间】:2013-07-26 19:51:42
【问题描述】:

我想将两个字符串分类为相似或不相似。例如

s1 = "Token is invalid. DeviceId = deviceId: "345" "
s2 = "Token is invalid. DeviceId = deviceId: "123" "
s3 = "Could not send Message."

我正在寻找一个可以在 2 个字符串之间给出匹配分数的 java 库,并且根据该分数我可以确定它们是否相似。我的程序只需要处理一个小数据集(~2000 个字符串)。你知道那里是否已经有可用的东西吗?

【问题讨论】:

    标签: java fuzzy-comparison


    【解决方案1】:

    【讨论】:

    【解决方案2】:

    按照建议。 Levenshtein 距离算法...

    public class LevenshteinDistance
    {
        private static int minimum(int a, int b, int c)
        {
            return Math.min(Math.min(a, b), c);
        }
    
        public static int computeLevenshteinDistance(CharSequence str1, CharSequence str2)
        {
            int[][] distance = new int[str1.length() + 1][str2.length() + 1];
    
            for (int i = 0; i <= str1.length(); i++)
                distance[i][0] = i;
            for (int j = 1; j <= str2.length(); j++)
                distance[0][j] = j;
    
            for (int i = 1; i <= str1.length(); i++)
                for (int j = 1; j <= str2.length(); j++)
                    distance[i][j] = minimum(distance[i - 1][j] + 1, 
                                             distance[i][j - 1] + 1, 
                                             distance[i - 1][j - 1] + ((str1.charAt(i - 1) == str2.charAt(j - 1)) ? 0 : 1));
    
            return distance[str1.length()][str2.length()];
        }
    
        public static void main(String[] args)
        {
            String s1 = "Token is invalid. DeviceId = deviceId: \"345\" ";
            String s2 = "Token is invalid. DeviceId = deviceId: \"123\" ";
            String s3 = "Could not send Message.";
    
            System.out.println(computeLevenshteinDistance(s1, s2)); // s1 VS. s2
            System.out.println(computeLevenshteinDistance(s1, s3)); // s1 VS. s3
            System.out.println(computeLevenshteinDistance(s2, s3)); // s2 Vs. s3
    
        }
    }
    

    【讨论】:

      【解决方案3】:

      对于所有 NLP java 问题,您应该检查 Apache Lucene 项目。但是,对于您的需要,一个简单的 Levenshtein 距离算法就足够了

      【讨论】:

        猜你喜欢
        • 2014-06-28
        • 1970-01-01
        • 1970-01-01
        • 2021-02-15
        • 2018-11-07
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多