【问题标题】:Check if set of characters exists in a string or not - Improvement检查字符串中是否存在字符集 - 改进
【发布时间】:2019-06-24 11:40:24
【问题描述】:

如果两个英文单词仅包含相同的字母,则它们是相似的。例如,food 和 good 不相似,但 dog 和 good 相似。 (如果A与B相似,则A中的所有字母都包含在B中,B中的所有字母都包含在A中。)

给定一个单词 W 和一个单词列表 L,找出 L 中与 W 相似的所有单词。将单词计数打印到标准输出。

示例:

输入(标准输入):

love
velo low vole lovee volvell lowly lower lover levo loved love lovee lowe lowes lovey lowan lowa evolve loves volvelle lowed love

输出(标准输出):

14

说明:

L中与love类似的词是 velo vole lovee volvell lover levo loved love lovee lovey evolve loves volvelle love

最多可达14

所以我目前的解决方案如下:

 public static void main(String[] args) {
    String[] arr = new String[]{"velo", "low", "vole", "lovee", "volvell", "lowly", "lower", "lover", "levo", "loved", "love",
            "lovee", "lowe", "lowes", "lovey", "lowan", "lowa", "evolve", "loves", "volvelle", "lowed", "love"};
    String s = "love";
    int result = 0;

    Pattern p = Pattern.compile(buildPattern(s));

    for (String val : arr) {
        if (p.matcher(val).find()) result++;
    }

    System.out.println(result);
}

private static String buildPattern(String s) {
    String pattern = "^";
    for (int i = 0; i < s.length(); i++) {
        pattern += "(?=.*" + s.charAt(i) + ")";
    }
    return pattern;
}

我想知道我的简单代码是否有任何改进。

Aho-Corasick 是适用的解决方案吗?

【问题讨论】:

  • 查看 codereview.stackexchange.com
  • 您可以轻松地手动验证只有 10 个单词匹配,而不是 14 个。
  • @ErwinBolwidt 哇,我真的想看看,这对我来说并不明显,我每次都数了 14。
  • "velo" 1 , "low" , "vole" 2, "lovee" 3 , "volvell" 4 , "lowly", "lower", "lover", "levo" 5 , "爱”,“爱” 6,“爱” 7,“洛威”,“洛伊”,“洛维”,“洛文”,“洛瓦”,“进化” 8,“爱”,“volvelle” 9,“洛德” , “爱” 10
  • "如果A与B相似,则A中的所有个字母都包含在B中,B中的所有个字母都包含在A中。 ”。由于“r”、“d”、“y”和“s”不在“love”中,因此这些词与“love”不相似。

标签: java string algorithm


【解决方案1】:

由于只有 26 个字母,而 int 中有 32 位,因此 int 足以容纳有关单词中出现哪些字母的所有信息:

static int getFingerprint(String s)
{
    int result=0;
    for (int i = s.length()-1; i>=0; --i) {
        char c = s.charAt(i);
        if (c>='a' && c<='z')
            result |= 1<<(int)(c-'a');
        else if (c>='A' && c<='Z')
            result |= 1<<(int)(c-'A');
    }
    return result;
}

public static void main(String[] args) {
    String[] arr = new String[]{"velo", "low", "vole", "lovee", "volvell", "lowly", "lower", "lover", "levo", "loved", "love",
        "lovee", "lowe", "lowes", "lovey", "lowan", "lowa", "evolve", "loves", "volvelle", "lowed", "love"};
    String s = "love";

    int fingerprint = getFingerprint(s);

    int matches = 0;
    for (String item : arr) {
        if (getFingerprint(item)==fingerprint)
            ++matches;
    }
    System.out.println(matches);
}

【讨论】:

    【解决方案2】:

    我建议简化正则表达式,不需要前瞻,简单的 "^[love]*$" 就可以了。

    private static String buildPattern(String s) {
        String pattern = "^[";
        for (int i = 0; i < s.length(); i++) {
            pattern += s.charAt(i);
        }
        pattern += "]*$";
        return pattern;
    }
    

    【讨论】:

      【解决方案3】:

      我会尽量避免为此使用正则表达式,并会自己检查这些字母。

      public static void main(String[] args)
      {
        String[] arr = new String[]{"velo", "low", "vole", "lovee", "volvell", "lowly", "lower", "lover", "levo", "loved", "love",
                "lovee", "lowe", "lowes", "lovey", "lowan", "lowa", "evolve", "loves", "volvelle", "lowed", "love"};
        String s = "love";
        int result = 0;
      
        for (String word : arr)
        {
          if (isSimilar(s, word))
          {
            result++;
          }
        }
      
        System.out.println(result);
      }
      
      private static boolean isSimilar(String word, String test)
      {
        for (char c : test.toCharArray())
        {
          if (word.indexOf(c) == -1)
          {
            return false;
          }
        }
        return true;
      }
      

      虽然目前我上面的例子只返回10?

      【讨论】:

      • 好吧,我认为没有太大区别,因为 indexOfO(nm)
      • 我个人认为这更容易阅读,因为它更容易看到代码在做什么。正则表达式可能很难解释,更不用说在运行时生成的了。
      【解决方案4】:

      我只计算了 10 个应该成功的,无论是在我的实现中还是在我手动检查时。

      就像比较每个单词中的字母集合是否相等一样简单

      public static void main(String... args)
      {
          String word = "love";
          List<String> strs = Arrays.asList(
              "velo", "low", "vole", "lovee", "volvell", "lowly", "lower", "lover", "levo", "loved", "love",
              "lovee", "lowe", "lowes", "lovey", "lowan", "lowa", "evolve", "loves", "volvelle", "lowed", "love"
          );
      
          System.out.println(
              strs.stream()
                 .filter(str -> chars(word).equals(chars(str)))
                 .count()
          );
      }
      
      private static Set<Character> chars(String word)
      {
          return word.chars()
              .mapToObj(ch -> (char) ch)
              .collect(Collectors.toSet());
      }
      

      【讨论】:

      • 你也得到 10 分让我对我的回答感觉更好
      【解决方案5】:
      public static void main(String[] args) {
          String[] arr = new String[]{"velo", "low", "vole", "lovee", "volvell", "lowly", "lower", "lover", "levo", "loved", "love",
                  "lovee", "lowe", "lowes", "lovey", "lowan", "lowa", "evolve", "loves", "volvelle", "lowed", "love"};
          String s = "love";
      
          Set<Character> searchWordCharacters = getDistinctCharacters(s);
          long result = Stream.of(arr)
                  .map(Scratch::getDistinctCharacters)
                  .filter(wordCharacters -> wordCharacters.size() == searchWordCharacters.size())
                  .filter(wordCharacters -> wordCharacters.containsAll(searchWordCharacters))
                  .peek(System.out::println)
                  .count();
          System.out.println(result);
      }
      
      private static Set<Character> getDistinctCharacters(String word) {
          return word.chars()
                  .mapToObj(i -> (char) i)
                  .collect(Collectors.toSet());
      }
      

      结果:1​​0

      【讨论】:

      • 匹配词有:{"velo", "vole", "lovee", "volvell", "levo", "love", "lovee", "evolve", "volvelle", "爱”}
      【解决方案6】:

      计数 10 应该成功!

      String[] arr = new String[] { "velo", "low", "vole", "lovee",
              "volvell", "lowly", "lower", "lover", "levo", "loved", "love",
              "lovee", "lowe", "lowes", "lovey", "lowan", "lowa", "evolve",
              "loves", "volvelle", "lowed", "love" };
      
      String s = "love";
      
      Predicate<Character> p = x -> s.indexOf(x) > -1 ? true : false;
      
      List<String> asList = Arrays.asList(arr);
      
      asList.stream().forEach(x -> {
          List<Character> chars = new ArrayList<>();
          for (int i = 0; i < x.length(); i++) {
              chars.add(x.charAt(i));
          }
          boolean anyMatch = chars.stream().allMatch(p);
          if (anyMatch)
              count++;
      });
      
      System.out.println(count);
      

      【讨论】:

        【解决方案7】:
        import java.util.Arrays;
        
        class SomeClass {
            public static void main(String[] args) {
                String[] arr = new String[]{"velo", "low", "vole", "lovee", "volvell", "lowly", "lower", "lover", "levo", "loved", "love",
                        "lovee", "lowe", "lowes", "lovey", "lowan", "lowa", "evolve", "loves", "volvelle", "lowed", "love"};
                String s = "love";
                int count = 0;
        
                boolean[] characters_state = new boolean[26];
                Arrays.fill(characters_state, false);
                for(int i = 0; i < s.length(); i++) {
                    characters_state[s.charAt(i) - 'a'] = true;
                }
        
                for(int i = 0; i < arr.length; i++) {
                    if (check(arr[i], characters_state.clone())) {
                        count++;
                    }
                }
                System.out.println(count);
            }
        
            static boolean check(String s, boolean[] characters_state) {
                for(int i = 0; i < s.length(); i++) {
                    if(!characters_state[s.charAt(i) - 'a']) {
                        return false;
                    }
                }
                return true;
            }
        }
        

        输出

        10
        
        real    0m0,210s
        user    0m0,206s
        sys 0m0,025s
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 2023-03-14
          • 1970-01-01
          • 2020-04-10
          • 2013-01-09
          • 2018-08-27
          • 1970-01-01
          • 2020-11-29
          相关资源
          最近更新 更多