通过删除子字符串获得的最大成本答案

【问题标题】：Maximum cost obtained by removing substring通过删除子字符串获得的最大成本
【发布时间】：2021-05-25 17:14:21
【问题描述】：

我有一个字符串S，只包含两个字符“x”和“y”。我还有一个与S 长度相同的正整数数组A。如果该子字符串具有所有相同的字符，我可以删除任何正长度（> 0）的子字符串。这一步的得分是A[len]，其中len是删除的子字符串的长度，索引是从1开始的（因为我们不能删除长度为0的子字符串）。我可以进一步删除这些子字符串，直到它变空并且分数将继续增加。我想最大化这个分数。没有必要尽量减少移动次数。

例如，设 S = "xyy" and A = [2,3,1];我可以选择子字符串 S[1:2]="yy"，结果字符串将是 "x"，分数是 3；现在我可以选择 S[0:0]="x"，结果字符串是 ""，分数是 5；

另一种方法是，选择S[0:0]，结果字符串为“yy”，得分为2；选择S[0:0]，结果字符串为“y”，得分为4；选择S[0 :0]，结果字符串是""，分数是6，比以前高。

我想不出一个贪婪的解决方案，所以尝试了蛮力：

# Checks if the chosen substring has all same characters or not
def check(s):return True if len(set(s)) == 1 else False

def cost(s):
    n = len(s)
    if n == 0:return 0
    if n == 1:return a[0]
    mx = -1

    # Try to remove all the substrings that satisfy the condition
    # And further check for resultant string after removal
    for i in range(n):
        for j in range(i,n):
            sub = s[i:j+1]
            if check(sub):mx = max(mx, a[len(sub)-1] + cost(s[:i]+s[j+1:]))
    return mx

此解决方案适用于长度不超过 8 的字符串，否则会卡住（基于我的系统配置），因此我在其中添加了记忆：

# Checks if the chosen substring has all same characters or not
def check(s):return True if len(set(s)) == 1 else False

dp = dict()

def cost(s):
    # If this string is present in dp, return score
    if s in dp:return dp[s]
    n = len(s)
    if n == 0:return 0
    if n == 1:return a[0]
    mx = -1

    # Try to remove all the substrings that satisfy the condition
    # And further check for resultant string after removal

    for i in range(n):
        for j in range(i,n):
            sub = s[i:j+1]
            if check(sub):mx = max(mx, a[len(sub)-1] + cost(s[:i]+s[j+1:]))

    dp[s] = mx
    return mx

它适用于长度不超过 20 的字符串。它满足我目前的要求，但可以进一步优化吗？它只是一种蛮力解决方案，因此对于长度超过 20 的字符串看起来不太令人满意。

能否优化到多项式时间O(N^2)或O(N^3)？

【问题讨论】：

标签： python string algorithm recursion binary

【解决方案1】：

一种可能的方法是使用itertools.groupby 预先计算子组，然后使用带有生成器的递归来获得组合。这样，您只需遍历每个子组一次：

from itertools import groupby
s = "xyy" 
a = [2,3,1]
def max_score(d, a, c = []):
   if d:
      for i in range(len(d[0])):
         yield from max_score(([] if not (k:=d[0][i+1:]) else [k])+d[1:], a, c+[a[i]])
   else:
      yield sum(c)

print(max(max_score([list(b) for _, b in groupby(s)], a)))

输出：

时间安排：

import random, time
def get_test_data(size):
   s = ''.join(random.choice(['x', 'y']) for _ in range(size))
   return [list(b) for _, b in groupby(s)], [random.randint(1, 10) for _ in s]

def av_time(s):
   t = time.time()
   _ = max(max_score(*get_test_data(s)))
   return time.time() - t

for i in [10, 20, 30, 40]:
    print(f'---------Size:{i}, Average time:{sum(av_time(i) for _ in range(10))/float(10)}---------')

输出：

---------Size:10, Average time:0.00027167797088623047---------
---------Size:20, Average time:0.010815811157226563---------
---------Size:30, Average time:0.3306509256362915---------
---------Size:40, Average time:13.801056122779846---------

此解决方案可以快速计算带有 len(s) <= 30 的字符串的结果，但对于较大的大小，时间会较慢。

【讨论】：

我使用了 Caterpillar 方法来计算子组（没关系，因为两者都会给出相同的时间复杂度）。但是递归太复杂了。我想知道如果字符串长度为 100 是否可以得到输出。