查找包含数组中所有单词的字符串的子字符串答案

【问题标题】：Finding Sub-Strings of String Containing all the words in array查找包含数组中所有单词的字符串的子字符串
【发布时间】：2012-06-28 18:09:41
【问题描述】：

我有一个字符串和一个单词数组，我必须编写代码来查找字符串的所有子字符串，这些子字符串以任意顺序包含数组中的所有单词。该字符串不包含任何特殊字符/数字，每个单词用空格分隔。

例如：

给定字符串：

aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb aaaa bbbb cccc

数组中的单词：

aaaa
bbbb
cccc

输出示例：

aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb    

aaaa aaaa aaaa aaaa cccc bbbb    

aaaa cccc bbbb bbbb bbbb bbbb    

cccc bbbb bbbb bbbb bbbb aaaa  

aaaa cccc bbbb

我已经使用 for 循环实现了这一点，但这非常低效。

我怎样才能更有效地做到这一点？

我的代码：

    for(int i=0;i<str_arr.length;i++)
    {
        if( (str_arr.length - i) >= words.length)
        {
            String res = check(i);
            if(!res.equals(""))
            {
                System.out.println(res);
                System.out.println("");
            }
            reset_all();
        }
        else
        {
            break;
        }
    }

public static String check(int i)
{
    String res = "";
    num_words = 0;

    for(int j=i;j<str_arr.length;j++)
    {
        if(has_word(str_arr[j]))
        {
            t.put(str_arr[j].toLowerCase(), 1);
            h.put(str_arr[j].toLowerCase(), 1);

            res = res + str_arr[j]; //+ " ";

            if(all_complete())
            {
                return res;
            }

            res = res + " ";
        }
        else
        {
            res = res + str_arr[j] + " ";
        }

    }
    res = "";
    return res;
}

【问题讨论】：

能举个例子就更好了
你为什么不展示你目前拥有的东西？
有什么限制？ String中的字符数，字数？
我不知道你是怎么得到结果的
为什么 aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb 匹配而不是 aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb 或 aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb？

标签： java string substring unordered

【解决方案1】：

我的第一种方法类似于以下伪代码

  for word:string {
    if word in array {
      for each stored potential substring {
        if word wasnt already found {
          remove word from notAlreadyFoundList
          if notAlreadyFoundList is empty {
            use starting pos and ending pos to save our substring
          }
        }
      store position and array-word as potential substring
  }

这应该有不错的性能，因为你只遍历字符串一次。

[编辑]

这是我的伪代码的一个实现，试试看它的性能是好是坏。它的工作假设是一旦找到最后一个单词就找到了匹配的子字符串。如果您真的想要所有匹配项，请更改标记为 //ALLMATCHES 的行：

class SubStringFinder {
    String textString = "aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb aaaa bbbb cccc";
    Set<String> words = new HashSet<String>(Arrays.asList("aaaa", "bbbb", "cccc"));

    public static void main(String[] args) {
        new SubStringFinder();
    }

    public SubStringFinder() {
        List<PotentialMatch> matches = new ArrayList<PotentialMatch>();
        for (String textPart : textString.split(" ")) {
            if (words.contains(textPart)) {
                for (Iterator<PotentialMatch> matchIterator = matches.iterator(); matchIterator.hasNext();) {
                    PotentialMatch match = matchIterator.next();
                    String result = match.tryMatch(textPart);
                    if (result != null) {
                        System.out.println("Match found: \"" + result + "\"");
                        matchIterator.remove(); //ALLMATCHES - remove this line
                    }
                }
                Set<String> unfound = new HashSet<String>(words);
                unfound.remove(textPart);
                matches.add(new PotentialMatch(unfound, textPart));
            }// ALLMATCHES add these lines 
             // else {
             // matches.add(new PotentialMatch(new HashSet<String>(words), textPart));
             // }
        }
    }

    class PotentialMatch {
        Set<String> unfoundWords;
        StringBuilder stringPart;
        public PotentialMatch(Set<String> unfoundWords, String part) {
            this.unfoundWords = unfoundWords;
            this.stringPart = new StringBuilder(part);
        }
        public String tryMatch(String part) {
            this.stringPart.append(' ').append(part);
            unfoundWords.remove(part);                
            if (unfoundWords.isEmpty()) {
                return this.stringPart.toString();
            }
            return null;
        }
    }
}

【讨论】：

在上面的代码中做了同样的事情，并且通过使用树形图搜索得到 o(log(n)) 时间复杂度以优化的方式......
看起来你为字符串中的每个单词遍历字符串一次，这会给你 O(n^2) 复杂度。
代码中这个TreeMap在哪里？如何使用 Map 优化嵌套 for 循环？
这部分代码使用 Treemap 来检索单词 t.put(str_arr[j].toLowerCase(), 1);
那么你仍然有一个嵌套的 for 循环，它给你 O(n^2*log n)。

【解决方案2】：

这是另一种方法：

public static void main(String[] args) throws FileNotFoundException {
    // init
    List<String> result = new ArrayList<String>();
    String string = "aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb aaaa bbbb cccc";
    String[] words = { "aaaa", "bbbb", "cccc" };
    // find all combs as regexps (e.g. "(aaaa )+(bbbb )+(cccc )*cccc", "(aaaa )+(cccc )+(bbbb )*bbbb")
    List<String> regexps = findCombs(Arrays.asList(words));
    // compile and add
    for (String regexp : regexps) {
        Pattern p = Pattern.compile(regexp);
        Matcher m = p.matcher(string);
        while (m.find()) {
            result.add(m.group());
        }
    }
    System.out.println(result);
}

private static List<String> findCombs(List<String> words) {
    if (words.size() == 1) {
        words.set(0, "(" + Pattern.quote(words.get(0)) + " )*" + Pattern.quote(words.get(0)));
        return words;
    }
    List<String> list = new ArrayList<String>();
    for (String word : words) {
        List<String> tail = new LinkedList<String>(words);
        tail.remove(word);
        for (String s : findCombs(tail)) {
            list.add("(" + Pattern.quote(word) + " ?)+" + s);
        }
    }
    return list;
}

这将输出：

[aaaa bbbb cccc, aaaa aaaa aaaa aaaa cccc bbbb bbbb bbbb bbbb, cccc bbbb bbbb bbbb bbbb aaaa]

我知道结果不完整：你只得到了可用的组合，完全扩展，但你得到了所有的。

【讨论】：