这是一个没有正则表达式但字符串切片和str.find 的版本:
def check(s1, s2):
i = 0
for c in s2: # looping over the characters in s2
if i < len(s1):
incr = s1[i:].find(c) + 1 # looking for c in the rest of s1
if incr == 0: # c not found
break
i += incr
else: # end of s1 reached, but still c's to cover
break
else: # loop went through without break -> found
return True
return False # loop exit with break -> not found
def check_contains(s1, s2):
return check(s1, s2) or check(s1[::-1], s2)
你的例子:
strings = [("qwer", "asdf"), ("abcdefghi", "dfge"), ("qwkedlrfid", "kelid"), ("abcdefghi", "hcba"), ("abacdfeag", "bca")]
for s1, s2 in strings:
print(check_contains(s1, s2))
结果:
False
False
True
True
True
编辑:check 显然是递归实现的候选者,它更紧凑并且在相同的范围内执行:
def check(s1, s2):
if not s2:
return True
if len(s1) < len(s2):
return False
i = s1.find(s2[0]) + 1
if i == 0:
return False
return check(s1[i:], s2[1:])
(还添加了健全性检查if len(s1) < len(s2): return False。)
我对性能测量进行了一些尝试:在我看来,对于您提供的字符串类型,Bharel 的版本比这个版本更有优势。当要搜索的字符串变大时,这似乎会改变。我尝试了以下方法(check_contains_1 是 Bharel 的解决方案,check_contains_2 是此答案中的解决方案):
from random import choices, randint
from string import ascii_lowercase as chars
from time import perf_counter
num = 10_000
max_len_1, max_len_2 = 50, 5
strings = [
(
"".join(choices(chars, k=randint(2, max_len_1))),
"".join(choices(chars, k=randint(2, max_len_2)))
)
for _ in range(num)
]
start = perf_counter()
result_1 = [check_contains_1(s1, s2) for s1, s2 in strings]
end = perf_counter()
print(f"Version 1: {end - start:.2f} secs")
start = perf_counter()
result_2 = [check_contains_2(s1, s2) for s1, s2 in strings]
end = perf_counter()
print(f"Version 2: {end - start:.2f} secs")
print(result_1 == result_2)
输出:
Version 1: 1.85 secs
Version 2: 0.04 secs
True
但也许我犯了一个错误......