【问题标题】:Java function needed for finding the longest duplicated substring in a string?查找字符串中最长重复子字符串所需的 Java 函数?
【发布时间】:2011-05-26 02:34:28
【问题描述】:

需要java函数来查找字符串中最长的重复子串

For instance, if the input is “banana”,output should be "ana" and we have count the number of times it has appeared in this case it is 2.

解决方法如下

公开课测试{
公共静态 void main(String[] args){
System.out.println(findLongestSubstring("我喜欢 ike"));
System.out.println(findLongestSubstring("女士我是亚当"));
System.out.println(findLongestSubstring("当生活递给你柠檬水,做柠檬"));
System.out.println(findLongestSubstring("banana"));
}

public static String findLongestSubstring(String value) {
    String[] strings = new String[value.length()];
    String longestSub = "";

    //strip off a character, add new string to array
    for(int i = 0; i < value.length(); i++){
        strings[i] = new String(value.substring(i));
    }

    //debug/visualization
    //before sort
    for(int i = 0; i < strings.length; i++){
        System.out.println(strings[i]);
    }

    Arrays.sort(strings);
    System.out.println();

    //debug/visualization
    //after sort
    for(int i = 0; i < strings.length; i++){
        System.out.println(strings[i]);
    }

    Vector<String> possibles = new Vector<String>();
    String temp = "";
    int curLength = 0, longestSoFar = 0;

    /*
     * now that the array is sorted compare the letters
     * of the current index to those above, continue until 
     * you no longer have a match, check length and add
     * it to the vector of possibilities
     */ 
    for(int i = 1; i < strings.length; i++){
        for(int j = 0; j < strings[i-1].length(); j++){
            if (strings[i-1].charAt(j) != strings[i].charAt(j)){
                break;
            }
            else{
                temp += strings[i-1].charAt(j);
                curLength++;
            }
        }
        //this could alleviate the need for a vector
        //since only the first and subsequent longest 
        //would be added; vector kept for simplicity
        if (curLength >= longestSoFar){
            longestSoFar = curLength;
            possibles.add(temp);
        }
        temp = "";
        curLength = 0;
    }

    System.out.println("Longest string length from possibles: " + longestSoFar);

    //iterate through the vector to find the longest one
    int max = 0;
    for(int i = 0; i < possibles.size();i++){
        //debug/visualization
        System.out.println(possibles.elementAt(i));
        if (possibles.elementAt(i).length() > max){ 
            max = possibles.elementAt(i).length();
            longestSub = possibles.elementAt(i);
        }
    }
    System.out.println();
    //concerned with whitespace up until this point
    // "lemon" not " lemon" for example
    return longestSub.trim(); 
}

}

【问题讨论】:

  • 有趣的问题,但你有没有尝试过?
  • @khachik,我不知道该怎么做
  • @Aix,你有同样的java函数吗,它说使用后缀树
  • @Deepak 如果这是家庭作业,你应该这样标记它。

标签: java algorithm


【解决方案1】:

This is a common CS problem with a dynamic programming solution.

编辑(为李杰):

您在技术上是正确的——这不是完全相同的问题。但是,这并不会使上面的链接无关紧要,并且如果提供的两个字符串相同,则可以使用相同的方法(特别是动态编程)——只需要进行一个修改:不要考虑沿对角线的情况。或者,正如其他人指出的那样(例如 LaGrandMere),使用后缀树(也可以在上面的链接中找到)。

编辑(针对 Deepak):

A Java implementation of the Longest Common Substring (using dynamic programming) can be found here。请注意,您需要修改它以忽略“对角线”(查看 Wikipedia 图),否则最长的公共字符串将是它本身!

【讨论】:

  • 问题不是最长公共子串。至少映射不是微不足道的。请注意,这个问题只有 1 个输入字符串,而 LCS 问题是在 2 个输入字符串之间获取最长的公共子字符串。
  • @lijie 感谢您让我保持警惕。我已经更新了答案。
  • @pst,我需要字符串中最长的重复子字符串,您的实现会返回“ana”作为答案吗
  • @lijie,答案是否会返回“ana”作为答案
  • @Deepak:是的,它会的。但是,它(DP 解决方案)并不是最有效的算法(即后缀树:O(n))
【解决方案2】:

在 Java 中:Suffix Tree

感谢那些找到解决方法的人,我不知道。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2020-12-09
    • 2023-03-31
    • 1970-01-01
    • 2016-08-11
    • 2021-07-24
    • 1970-01-01
    • 2012-04-20
    相关资源
    最近更新 更多