模式匹配动态规划建议答案

【问题标题】：pattern match dynamic programming for advice模式匹配动态规划建议
【发布时间】：2015-12-08 20:05:43
【问题描述】：

解决以下模式匹配问题。并发布详细的问题陈述和代码。代码正在运行。在下面的实现中，它在外循环中循环查找模式，然后在内部循环匹配源字符串——以构建二维 DP 表。

我的问题是，如果我更改实现，哪个外部循环用于匹配源字符串，而内部循环用于模式。是否会有任何性能提升或任何功能缺陷？任何关于哪种口味更好或几乎相同的建议都值得赞赏。

更具体地说，我的意思是从下面更改循环（对循环的内容使用类似的逻辑），

    for i in range(1, len(p) + 1):
        for j in range(1, len(s) + 1):

到，

    for i in range(1, len(s) + 1):
        for j in range(1, len(p) + 1):

问题陈述

'.'匹配任何单个字符。
'*' 匹配零个或多个前面的元素。

匹配应该覆盖整个输入字符串（不是部分）。

函数原型应该是：
bool isMatch(const char *s, const char *p)

一些例子：
isMatch("aa","a") → 假
isMatch("aa","aa") → 真
isMatch("aaa","aa") → 假
isMatch("aa", "a*") → true
isMatch("aa", ".*") → true
isMatch("ab", ".*") → true
isMatch("aab", "c*a*b") → true

class Solution(object):

    def isMatch(self, s, p):
        # The DP table and the string s and p use the same indexes i and j, but
        # table[i][j] means the match status between p[:i] and s[:j], i.e.
        # table[0][0] means the match status of two empty strings, and
        # table[1][1] means the match status of p[0] and s[0]. Therefore, when
        # refering to the i-th and the j-th characters of p and s for updating
        # table[i][j], we use p[i - 1] and s[j - 1].

        # Initialize the table with False. The first row is satisfied.
        table = [[False] * (len(s) + 1) for _ in range(len(p) + 1)]

        # Update the corner case of matching two empty strings.
        table[0][0] = True

        # Update the corner case of when s is an empty string but p is not.
        # Since each '*' can eliminate the charter before it, the table is
        # vertically updated by the one before previous. [test_symbol_0]
        for i in range(2, len(p) + 1):
            table[i][0] = table[i - 2][0] and p[i - 1] == '*'

        for i in range(1, len(p) + 1):
            for j in range(1, len(s) + 1):
                if p[i - 1] != "*":
                    # Update the table by referring the diagonal element.
                    table[i][j] = table[i - 1][j - 1] and \
                                  (p[i - 1] == s[j - 1] or p[i - 1] == '.')
                else:
                    # Eliminations (referring to the vertical element)
                    # Either refer to the one before previous or the previous.
                    # I.e. * eliminate the previous or count the previous.
                    # [test_symbol_1]
                    table[i][j] = table[i - 2][j] or table[i - 1][j]

                    # Propagations (referring to the horizontal element)
                    # If p's previous one is equal to the current s, with
                    # helps of *, the status can be propagated from the left.
                    # [test_symbol_2]
                    if p[i - 2] == s[j - 1] or p[i - 2] == '.':
                        table[i][j] |= table[i][j - 1]

        return table[-1][-1]

提前致谢，林

【问题讨论】：

你为什么不用正则表达式？
@deloz，这只是一个 DP 编程难题。感谢您对我最初的问题的建议。 :)

标签： python algorithm

【解决方案1】：

如果交换循环，i 将是 s 的索引，j 将是 p 的索引。您需要在循环中的任何位置交换 i 和 j。

    for i in range(1, len(s) + 1):
        for j in range(1, len(p) + 1):
            if p[j - 1] != "*":
                # Update the table by referring the diagonal element.
                table[j][i] = table[j - 1][i - 1] and \
                              (p[j - 1] == s[i - 1] or p[j - 1] == '.')
            else:
                # Eliminations (referring to the vertical element)
                # Either refer to the one before previous or the previous.
                # I.e. * eliminate the previous or count the previous.
                # [test_symbol_1]
                table[j][i] = table[j - 2][i] or table[j - 1][i]

                # Propagations (referring to the horizontal element)
                # If p's previous one is equal to the current s, with
                # helps of *, the status can be propagated from the left.
                # [test_symbol_2]
                if p[j - 2] == s[i - 1] or p[j - 2] == '.':
                    table[j][i] |= table[j][i - 1]

原始算法逐行填充table（第一行1，然后是2、3，...）。交换后，表格将被逐列填充（第一列 1，然后是 2、3、...）。

算法的思想保持不变，因为table 中的每个元素都是通过前一列或多行上的元素定义的——无论你是逐行还是逐列地计算出的元素列。

具体来说，table[j][i]是通过上一列table[j-1][i-1]的对角元素定义的；或前行和/或列中的元素table[j-2][i]、table[j-1][i] 和/或table[j][i-1]。

因此，交换后的性能是相同的。在这两个版本中，table 元素的每次计算都需要一个常数时间。构造table的总时间为O(len(s) * len(p))。

功能更换后也一样。基本上如果原版是正确的，那么修改后的版本也是正确的。原来的正确与否是另一回事......

让我们看看原始版本。乍一看，i = 1：table[i - 2][j]和p[i - 2]的两个地方似乎有索引问题。

但是，Python 将索引 -1 解释为最后一个元素。所以，table[-1][j] 指的是table 的最后一行，其中所有元素都是False。所以，table[1][j] = table[-1][j] or table[0][j] 等价于table[1][j] = table[0][j]。

对于p[-1]，请注意您只能在if 语句中访问它，而p[0] = *（这对于匹配没有意义）。 p[-1] 的值是多少无关紧要，因为它不会影响table[i][j] 的值。看这个：如果if-语句的结果恰好是True，我们知道table[1][0]最初是False，所以table[1][1]，table[1][2]，...也必须是@987654358 @。换句话说，p[0] = * 不会匹配任何字符串。

【讨论】：

谢谢dejvuth，你能总结出功能和性能是一样的吗？
顺便说一句，dejvuth，不确定你的代码是否有问题，p[j - 2] == '.'，假设当 j == 1，j - 2 为 -1 时，它可能不是你想要的？谢谢。
感谢 dejvuth，对于 if 语句，如果你的意思是这个语句，p[j - 2] == s[i - 1]，我确实有疑问，因为当j == 1、p[j-2] 与 p[-1] 相同时，这是模式的最后一个字符——有效。假设 pattern 的最后一个字符恰好匹配s[i-1]，if 条件将为 True，但这不是我们想要的。任何 cmets 表示赞赏。如果我不能正确理解您编辑的新观点，请随时纠正我。 :)
顺便说一句，dejvuth，我再次阅读了您的 cmets，我想我明白了您的意思。感谢所有的帮助和耐心。并将您的回复标记为答案并给予 50 分。祝你有美好的一天。 :)