模式匹配面试 Q答案

【问题标题】：Pattern matching interview Q模式匹配面试 Q
【发布时间】：2015-02-13 20:39:00
【问题描述】：

我最近在接受采访，他们问了我以下问题：

如果字符串匹配模式，则编写一个函数返回true，false 否则

模式：每个项目 1 个字符，（a-z），输入：空格分隔字符串

这是我对第一个问题的解决方案：

static boolean isMatch(String pattern, String input) {
    char[] letters = pattern.toCharArray();
    String[] split = input.split("\\s+");

    if (letters.length != split.length) {
        // early return - not possible to match if lengths aren't equal
        return false;
    }

    Map<String, Character> map = new HashMap<>();
    // aaaa test test test1 test1
    boolean used[] = new boolean[26];
    for (int i = 0; i < letters.length; i++) {
        Character existing = map.get(split[i]);
        if (existing == null) {
            // put into map if not found yet
            if (used[(int)(letters[i] - 'a')]) {
                return false;
            }

            used[(int)(letters[i] - 'a')] = true;
            map.put(split[i], letters[i]);
        } else {
            // doesn't match - return false
            if (existing != letters[i]) {
                return false;
            }
        }
    }

    return true;
}

public static void main(String[] argv) {
    System.out.println(isMatch("aba", "blue green blue"));
    System.out.println(isMatch("aba", "blue green green"));
}

问题的下一部分难住了我：

输入中没有分隔符，编写相同的函数。

例如：

isMatch("aba", "bluegreenblue") -> true
isMatch("abc","bluegreenyellow") -> true
isMatch("aba", "t1t2t1") -> true
isMatch("aba", "t1t1t1") -> false
isMatch("aba", "t1t11t1") -> true
isMatch("abab", "t1t2t1t2") -> true
isMatch("abcdefg", "ieqfkvu") -> true
isMatch("abcdefg", "bluegreenredyellowpurplesilvergold") -> true
isMatch("ababac", "bluegreenbluegreenbluewhite") -> true
isMatch("abdefghijklmnopqrstuvwxyz", "zyxwvutsrqponmlkjihgfedcba") -> true

我写了一个蛮力解决方案（生成大小为letters.length 的输入字符串的所有可能拆分，并依次检查isMatch），但面试官说它不是最优的。

我不知道如何解决这部分问题，这甚至可能还是我错过了什么？

他们正在寻找时间复杂度为 O(M x N ^ C) 的东西，其中 M 是模式的长度，N 是输入的长度，C 是某个常数。

澄清

我不是在寻找正则表达式解决方案，即使它有效。
我不是在寻找一种简单的解决方案，它可以生成所有可能的拆分并检查它们，即使进行了优化，因为那总是指数级的时间。

【问题讨论】：

嗯，当然可以。只需找到将字符串拆分为pattern.Length 子字符串的所有方法，看看它们中的任何一个是否适合该模式。有趣的问题是是否有比这更好的东西。
我做到了，面试官说不是最优的。
是的，a = t1，b = t12
另一个想法：在考虑算法时可能需要考虑一个有用的例子：isMatch("abcd", "aaaaaaaaaa") -> true
这是一个有趣的问题，但缺乏说明。在底部的示例之前，尚不清楚“如果字符串与模式匹配”是什么意思。一个写得很好的问题应该首先在引入任何代码之前准确地解释问题（用例子）。

标签： java algorithm

【解决方案1】：

可以优化回溯解决方案。我们可以“即时”检查它，而不是先生成所有拆分然后检查它是否有效。假设我们已经拆分了初始字符串的前缀（长度为p）并匹配了模式中的i 字符。我们来看看i + 1这个字符。

如果前缀中有一个字符串对应于i + 1 字母，我们应该只检查从p + 1 位置开始的子字符串是否等于它。如果是，我们就继续i + 1 和p + the length of this string。否则，我们可以杀死这个分支。
如果没有这样的字符串，我们应该尝试所有从p + 1 开始并在它之后某处结束的子字符串。

我们还可以使用以下思路来减少您的解决方案中的分支数量：我们可以估计尚未处理的模式的后缀长度（我们知道已经代表某些字母的长度）字符串，我们知道模式中任何字母的字符串长度的一个微不足道的下限（它是 1）。如果初始字符串的剩余部分太短而无法匹配模式的其余部分，它允许我们终止一个分支。

此解决方案仍然具有指数时间复杂度，但它的工作速度比生成所有拆分要快得多，因为无效解决方案可以更早地被丢弃，因此可到达状态的数量可以显着减少。

【讨论】：

【解决方案2】：

我觉得这是作弊，我不相信捕获组和不情愿的量词会做正确的事。或者，也许他们正在寻找您是否能够认识到，由于量词的工作方式，匹配是模棱两可的。

boolean matches(String s, String pattern) {
    StringBuilder patternBuilder = new StringBuilder();
    Map<Character, Integer> backreferences = new HashMap<>();
    int nextBackreference = 1;

    for (int i = 0; i < pattern.length(); i++) {
        char c = pattern.charAt(i);

        if (!backreferences.containsKey(c)) {
            backreferences.put(c, nextBackreference++);
            patternBuilder.append("(.*?)");
        } else {
            patternBuilder.append('\\').append(backreferences.get(c));
        }
    }

    return s.matches(patternBuilder.toString());
}

【讨论】：

我自己也想过同样的事情。我相信它应该会成功，这可能是他们所期待的。

【解决方案3】：

您可以通过首先假设令牌长度并检查令牌长度的总和是否等于测试字符串的长度来改进暴力破解。这比每次的模式匹配都要快。然而，随着唯一令牌数量的增加，仍然非常缓慢。

【讨论】：

【解决方案4】：

更新：这是我的解决方案。基于我之前的解释。

import com.google.common.collect.*;
import org.apache.commons.lang3.StringUtils;
import org.apache.commons.lang3.tuple.Pair;
import org.apache.commons.math3.util.Combinations;

import java.util.*;

/**
 * Created by carlos on 2/14/15.
 */
public class PatternMatcher {

    public static boolean isMatch(char[] pattern, String searchString){
        return isMatch(pattern, searchString, new TreeMap<Integer, Pair<Integer, Integer>>(), Sets.newHashSet());
    }
    private static boolean isMatch(char[] pattern, String searchString, Map<Integer, Pair<Integer, Integer>> candidateSolution, Set<String> mappedStrings) {
        List<Integer> occurrencesOfCharacterInPattern = getNextUnmappedPatternOccurrences(candidateSolution, pattern);
        if(occurrencesOfCharacterInPattern.size() == 0)
            return isValidSolution(candidateSolution, searchString, pattern, mappedStrings);
        List<Pair<Integer, Integer>> sectionsOfUnmappedStrings = sectionsOfUnmappedStrings(searchString, candidateSolution);
        if(sectionsOfUnmappedStrings.size() == 0)
            return false;
        String firstUnmappedString = substring(searchString, sectionsOfUnmappedStrings.get(0));


        for (int substringSize = 1; substringSize <= firstUnmappedString.length(); substringSize++) {
            String candidateSubstring = firstUnmappedString.substring(0, substringSize);
            if(mappedStrings.contains(candidateSubstring))
                continue;
            List<Pair<Integer, Integer>> listOfAllOccurrencesOfSubstringInString = Lists.newArrayList();
            for (int currentIndex = 0; currentIndex < sectionsOfUnmappedStrings.size(); currentIndex++) {
                Pair<Integer,Integer> currentUnmappedSection = sectionsOfUnmappedStrings.get(currentIndex);
                List<Pair<Integer, Integer>> occurrencesOfSubstringInString =
                        findAllInstancesOfSubstringInString(searchString, candidateSubstring,
                                currentUnmappedSection);
                for(Pair<Integer,Integer> possibleAddition:occurrencesOfSubstringInString) {
                    listOfAllOccurrencesOfSubstringInString.add(possibleAddition);
                }
            }

            if(listOfAllOccurrencesOfSubstringInString.size() < occurrencesOfCharacterInPattern.size())
                return false;

            Iterator<int []> possibleSolutionIterator =
                    new Combinations(listOfAllOccurrencesOfSubstringInString.size(),
                            occurrencesOfCharacterInPattern.size()).iterator();
            iteratorLoop:
            while(possibleSolutionIterator.hasNext()) {
                Set<String> newMappedSets = Sets.newHashSet(mappedStrings);
                newMappedSets.add(candidateSubstring);
                TreeMap<Integer,Pair<Integer,Integer>> newCandidateSolution = Maps.newTreeMap();
                // why doesn't Maps.newTreeMap(candidateSolution) work?
                newCandidateSolution.putAll(candidateSolution);

                int [] possibleSolutionIndexSet = possibleSolutionIterator.next();

                for(int i = 0; i < possibleSolutionIndexSet.length; i++) {
                    Pair<Integer, Integer> candidatePair = listOfAllOccurrencesOfSubstringInString.get(possibleSolutionIndexSet[i]);
                    //if(candidateSolution.containsValue(Pair.of(0,1)) && candidateSolution.containsValue(Pair.of(9,10)) && candidateSolution.containsValue(Pair.of(18,19)) && listOfAllOccurrencesOfSubstringInString.size() == 3 && candidateSolution.size() == 3 && possibleSolutionIndexSet[0]==0 && possibleSolutionIndexSet[1] == 2){
                    if (makesSenseToInsert(newCandidateSolution, occurrencesOfCharacterInPattern.get(i), candidatePair))
                        newCandidateSolution.put(occurrencesOfCharacterInPattern.get(i), candidatePair);
                    else
                        break iteratorLoop;
                }

                if (isMatch(pattern, searchString, newCandidateSolution,newMappedSets))
                    return true;
            }

        }
        return false;
    }

    private static boolean makesSenseToInsert(TreeMap<Integer, Pair<Integer, Integer>> newCandidateSolution, Integer startIndex, Pair<Integer, Integer> candidatePair) {
        if(newCandidateSolution.size() == 0)
            return true;

        if(newCandidateSolution.floorEntry(startIndex).getValue().getRight() > candidatePair.getLeft())
            return false;

        Map.Entry<Integer, Pair<Integer, Integer>> ceilingEntry = newCandidateSolution.ceilingEntry(startIndex);
        if(ceilingEntry !=null)
            if(ceilingEntry.getValue().getLeft() < candidatePair.getRight())
                return false;

        return true;
    }

    private static boolean isValidSolution( Map<Integer, Pair<Integer, Integer>> candidateSolution,String searchString, char [] pattern, Set<String> mappedStrings){
        List<Pair<Integer,Integer>> values = Lists.newArrayList(candidateSolution.values());
        return  areIntegersConsecutive(Lists.newArrayList(candidateSolution.keySet())) &&
                arePairsConsecutive(values) &&
                values.get(values.size() - 1).getRight() == searchString.length() &&
                patternsAreUnique(pattern,mappedStrings);
    }

    private static boolean patternsAreUnique(char[] pattern, Set<String> mappedStrings) {
        Set<Character> uniquePatterns = Sets.newHashSet();
        for(Character character:pattern)
            uniquePatterns.add(character);

        return uniquePatterns.size() == mappedStrings.size();
    }

    private static List<Integer> getNextUnmappedPatternOccurrences(Map<Integer, Pair<Integer, Integer>> candidateSolution, char[] searchArray){
        List<Integer> allMappedIndexes = Lists.newLinkedList(candidateSolution.keySet());
        if(allMappedIndexes.size() == 0){
            return occurrencesOfCharacterInArray(searchArray,searchArray[0]);
        }
        if(allMappedIndexes.size() == searchArray.length){
            return Lists.newArrayList();
        }
        for(int i = 0; i < allMappedIndexes.size()-1; i++){
            if(!areIntegersConsecutive(allMappedIndexes.get(i),allMappedIndexes.get(i+1))){
                return occurrencesOfCharacterInArray(searchArray,searchArray[i+1]);
            }
        }
        List<Integer> listOfNextUnmappedPattern = Lists.newArrayList();
        listOfNextUnmappedPattern.add(allMappedIndexes.size());
        return listOfNextUnmappedPattern;
    }

    private static String substring(String string, Pair<Integer,Integer> bounds){
        try{
            string.substring(bounds.getLeft(),bounds.getRight());
        }catch (StringIndexOutOfBoundsException e){
            System.out.println();
        }
        return string.substring(bounds.getLeft(),bounds.getRight());
    }

    private static List<Pair<Integer, Integer>> sectionsOfUnmappedStrings(String searchString, Map<Integer, Pair<Integer, Integer>> candidateSolution) {
        if(candidateSolution.size() == 0) {
            return Lists.newArrayList(Pair.of(0, searchString.length()));
        }
        List<Pair<Integer, Integer>> sectionsOfUnmappedStrings = Lists.newArrayList();
        List<Pair<Integer,Integer>> allMappedPairs = Lists.newLinkedList(candidateSolution.values());

        // Dont have to worry about the first index being mapped because of the way the first candidate solution is made
        for(int i = 0; i < allMappedPairs.size() - 1; i++){
            if(!arePairsConsecutive(allMappedPairs.get(i), allMappedPairs.get(i + 1))){
                Pair<Integer,Integer> candidatePair = Pair.of(allMappedPairs.get(i).getRight(), allMappedPairs.get(i + 1).getLeft());
                sectionsOfUnmappedStrings.add(candidatePair);
            }
        }

        Pair<Integer,Integer> lastMappedPair = allMappedPairs.get(allMappedPairs.size() - 1);
        if(lastMappedPair.getRight() != searchString.length()){
            sectionsOfUnmappedStrings.add(Pair.of(lastMappedPair.getRight(),searchString.length()));
        }

        return sectionsOfUnmappedStrings;
    }

    public static boolean areIntegersConsecutive(List<Integer> integers){
        for(int i = 0; i < integers.size() - 1; i++)
            if(!areIntegersConsecutive(integers.get(i),integers.get(i+1)))
                return false;
        return true;
    }

    public static boolean areIntegersConsecutive(int left, int right){
        return left == (right - 1);
    }

    public static boolean arePairsConsecutive(List<Pair<Integer,Integer>> pairs){
        for(int i = 0; i < pairs.size() - 1; i++)
            if(!arePairsConsecutive(pairs.get(i), pairs.get(i + 1)))
                return false;
        return true;
    }


    public static boolean arePairsConsecutive(Pair<Integer, Integer> left, Pair<Integer, Integer> right){
        return left.getRight() == right.getLeft();
    }

    public static List<Integer> occurrencesOfCharacterInArray(char[] searchArray, char searchCharacter){
        assert(searchArray.length>0);

        List<Integer> occurrences = Lists.newLinkedList();
        for(int i = 0;i<searchArray.length;i++){
            if(searchArray[i] == searchCharacter)
                occurrences.add(i);
        }
        return occurrences;
    }

    public static List<Pair<Integer,Integer>> findAllInstancesOfSubstringInString(String searchString, String substring, Pair<Integer,Integer> bounds){
        String string = substring(searchString,bounds);
        assert(StringUtils.isNoneBlank(substring,string));

        int lastIndex = 0;
        List<Pair<Integer,Integer>> listOfOccurrences = Lists.newLinkedList();
        while(lastIndex != -1){
            lastIndex = string.indexOf(substring,lastIndex);
            if(lastIndex != -1){
                int newIndex = lastIndex + substring.length();
                listOfOccurrences.add(Pair.of(lastIndex + bounds.getLeft(), newIndex + bounds.getLeft()));
                lastIndex = newIndex;
            }
        }
        return listOfOccurrences;
    }
}

它适用于提供的案例，但未经彻底测试。如果有任何错误，请告诉我。

原始回复：

假设您正在搜索的字符串可以具有任意长度的标记（您的一些示例确实如此），那么：

您想开始尝试将字符串分解为与模式匹配的部分。沿途寻找矛盾以减少您的搜索树。

当您开始处理时，您将选择字符串开头的 N 个字符。现在，去看看你是否可以在字符串的其余部分找到那个子字符串。如果你不能，那么它不可能是一个解决方案。如果可以，那么您的字符串看起来像这样

(N 个字符)<...>[(N 个字符)<...>] 其中任一 <...> 包含 0+ 个字符并且不一定是相同的子字符串。 [] 里面的内容可以重复的次数等于字符串中出现的次数（N 个字符）。

现在，您已经匹配了模式的第一个字母，您不确定模式的其余部分是否匹配，但是您基本上可以重新使用此算法（经过修改）来询问 <...> 部分细绳。

当 N = 1,2,3,4... 有意义吗？

我将举一个例子（它不涵盖所有情况，但希望能说明）注意，当我指代模式中的子字符串时，我将使用单引号，而当我指的是字符串我将使用双引号。

isMatch("ababac", "bluegreenbluegreenbluewhite")

好的，'a' 是我的第一个模式。对于 N = 1，我得到字符串“b” 搜索字符串中的“b”在哪里？ bluegreenbluegreenbluewhite。

好的，所以此时此字符串可能与“b”匹配，即模式“a”。让我们看看我们是否可以对模式“b”做同样的事情。从逻辑上讲，“b”必须是整个字符串“luegreen”（因为它被压缩在两个连续的“a”模式之间）然后我在第二个和第三个“a”之间检查。是的，它的“绿光”。

好的，到目前为止，我已经匹配了除“c”之外的所有模式。简单的情况，它的其余部分。匹配。

这基本上是在编写一个 Perl 正则表达式解析器。 ababc = (.+)(.+)(\1)(\2)(.+)。所以你只需要将它转换为 Perl 正则表达式

【讨论】：

为什么你会假设所有的模式都是相同的长度？
您可以这样做以节省特殊情况下的处理。首先检查它要快得多。在他的例子中，他有很多例子。
If your pattern is 3 long, then the string has to be a string length has to be divisible by 3 - false: aba and string = '1235123`, 长度 7。
也许你正在做某事，但就目前而言，你的答案并不明确。请花一些时间把它说清楚，然后重新发布。
这几乎可以工作。问题在于，使用正则表达式的想法，您没有考虑到所有命名模式都必须不同的约束。（另外，我不确定你是否可以有超过 9 个反向引用）

【解决方案5】：

这是我的代码的示例 sn-p：

public static final boolean isMatch(String patternStr, String input) {
    // Initial Check (If all the characters in the pattern string are unique, degenerate case -> immediately return true)
    char[] patt = patternStr.toCharArray();
    Arrays.sort(patt);
    boolean uniqueCase = true;
    for (int i = 1; i < patt.length; i++) {
        if (patt[i] == patt[i - 1]) {
            uniqueCase = false;
            break;
        }
    }
    if (uniqueCase) {
        return true;
    }
    String t1 = patternStr;
    String t2 = input;
    if (patternStr.length() == 0 && input.length() == 0) {
        return true;
    } else if (patternStr.length() != 0 && input.length() == 0) {
        return false;
    } else if (patternStr.length() == 0 && input.length() != 0) {
        return false;
    }
    int count = 0;
    StringBuffer sb = new StringBuffer();
    char[] chars = input.toCharArray();
    String match = "";
    // first read for the first character pattern
    for (int i = 0; i < chars.length; i++) {
        sb.append(chars[i]);
        count++;
        if (!input.substring(count, input.length()).contains(sb.toString())) {
            match = sb.delete(sb.length() - 1, sb.length()).toString();
            break;
        }
    }
    if (match.length() == 0) {
        match = t2;
    }
    // based on that character, update patternStr and input string
    t1 = t1.replace(String.valueOf(t1.charAt(0)), "");
    t2 = t2.replace(match, "");
    return isMatch(t1, t2);
}

我基本上决定先解析模式字符串并确定模式字符串中是否有任何匹配的字符。例如在“aab”中，“a”在模式字符串中使用了两次，因此“a”不能映射到其他东西。否则，如果字符串中没有匹配的字符，例如“abc”，那么我们的输入字符串是什么都无关紧要，因为模式是唯一的，因此每个模式字符匹配什么都无关紧要（退化大小写）。

如果模式字符串中有匹配的字符，那么我将开始检查每个字符串匹配的内容。不幸的是，如果不知道分隔符，我不知道每个字符串有多长。相反，我只是决定一次解析 1 个字符并检查字符串的其他部分是否包含相同的字符串，并继续逐个字母将字符添加到缓冲区中，直到在输入字符串中找不到缓冲区字符串。一旦我确定了字符串，它现在就在缓冲区中，我只需删除输入字符串中的所有匹配字符串和模式字符串中的字符模式，然后递归。

抱歉，如果我的解释不是很清楚，我希望我的代码可以清楚。

【讨论】：

适用于大多数情况，不适用于“aba”、“t1t12t1”等情况
也不考虑拆分的顺序，“aba”、“bluebluegreen”应该为假的时候为真。
我认为如果我们可以修复像“t1t12t1”这样的排序和修复案例，这个解决方案将是完美的。
这种情况有效吗？ isMatch("aba", "bluegreenblube") -> true