匹配字符串数组中的单词并返回带有匹配项和计数的新数组答案

【问题标题】：Matching words in array of strings and returning new array with the matches and count匹配字符串数组中的单词并返回带有匹配项和计数的新数组
【发布时间】：2021-10-04 09:47:08
【问题描述】：

我正在尝试获取单词列表并比较每个单词。如果单词是相似的或子字符串，它将返回有多少单词相似的计数。如果没有找到，它将返回 -1。

当我运行输入{similar, liar, not,knot, java}... output{not}的代码时，什么时候应该输出{[similar, liar], [knot, not]} count = 2。但如果我添加这些词，它会起作用，我不知道为什么。输入：mass, as, hero, superhero, not output: as, hero count is = 2

public static List<String> stringMatching(String [] words){
        HashSet<String> temp = new HashSet<>();
        int n = words.length;
        int count = 0;
        
        for(int i = 0; i < n-1; i++) {
            String currentWord = words[i];
            for(int j = i+1; j<n; j++) {
                String nextWord = words[j];
                if(currentWord.contains(nextWord)) {
                    temp.add(nextWord);
                    count++;
                }
                if(nextWord.contains(currentWord)) {
                    temp.add(currentWord);
                    count++;
                }
                
            }
            
        }
        System.out.println(count);
        return new ArrayList<String>(temp);

public static void main(String[] args){
   String[] words = {"mass", "as", "hero", "superhero", "not"};
        List<String> listM = stringMatching(words);
        for(String x: listM) {
            System.out.println(x+ " ");
        }
}

【问题讨论】：

你如何定义“相似”？
在您的示例中，“liar”这个词并没有完全出现在“similar”字符串中，因此子字符串不会匹配它； "similar".contains("liar") 是假的。
在你的“应该输出”的例子中，“骗子”这个词不是“相似”的子字符串，所以正如波西米亚问的那样，你如何定义字符串到相似吗？
我在看类似的字符...抱歉可能更像是一个字谜。

标签： java arrays list set

【解决方案1】：

目前，您的代码输出

2
as
hero

根据您的代码，这是正确的。请注意，您是：

在给定字符串数组的情况下进行匹配
如果所述字符串包含或包含在currentWord 中，则添加nextWord
打印count
返回经过验证的nextWords 列表
打印返回列表中的每个字符串。

据我在您的问题中的理解，您想要的输出应该是这样的

[mass, as] 
[superhero, hero]
count = 2

用于代码中的字符串。对于您问题中的字符串，输出应为[knot, not] count = 1，因为相似不包含骗子。

生成该输出的一种方法是先打印单词并返回计数值。

    public static int stringMatching(String [] words){
        int n = words.length;
        int count = 0;
        
        for(int i = 0; i < n-1; i++) {
            String currentWord = words[i];
            for(int j = i+1; j<n; j++) {
                String nextWord = words[j];
                if(currentWord.contains(nextWord)) {
                    System.out.format("[%s, %s]\n", currentWord, nextWord);
                    count++;
                }
                if(nextWord.contains(currentWord)) {
                    System.out.format("[%s, %s]\n", nextWord, currentWord);
                    count++;
                }
            }
        }
        return count == 0 ? -1 : count; //if no strings matched, return -1 instead of 0. 
    }

    public static void main(String[] args){
        String[] words = {"mass", "as", "hero", "superhero", "not"};
        int count = stringMatching(words);
        System.out.format("Count = %d", count);
    }

有了这段代码，输出是

[mass, as]
[superhero, hero]
Count = 2

对于你问题中的单词，结果是

[knot, not]
Count = 1

【讨论】：

所以对于 [knot not] 它还需要显示 [similar, liar] 如果我想要的话，我必须检查每个字符吗？像骗子一样被认为是相似的字谜。
如果是这种情况，那么您将需要像Levenshtein's distance 这样的额外算法来查找相似字符串。话虽如此，是的：您可以逐个字符地查看，但是您如何知道 liar 的 i 是第一个还是第二个 i 类似？ Rail 然后也会“包含” liar。你需要考虑这些情况。
好的，我正在考虑使用一种方法来获取字谜列表并与原始列表进行比较，然后将结果保存到新数组中。但我会看一下 Levenshteins 的距离并检查一下。