【发布时间】:2011-06-21 17:23:21
【问题描述】:
我有一串字母,我想将其拆分成所有可能的组合(字母的顺序必须保持固定),这样:
s = 'monkey'
变成:
combinations = [['m', 'onkey'], ['mo', 'nkey'], ['m', 'o', 'nkey'] ... etc]
有什么想法吗?
【问题讨论】:
标签: python string split permutation
我有一串字母,我想将其拆分成所有可能的组合(字母的顺序必须保持固定),这样:
s = 'monkey'
变成:
combinations = [['m', 'onkey'], ['mo', 'nkey'], ['m', 'o', 'nkey'] ... etc]
有什么想法吗?
【问题讨论】:
标签: python string split permutation
def splitter(str):
for i in range(1, len(str)):
start = str[0:i]
end = str[i:]
yield (start, end)
for split in splitter(end):
result = [start]
result.extend(split)
yield result
combinations = list(splitter(str))
请注意,我默认使用生成器以防止您因长字符串而耗尽内存。
【讨论】:
http://wordaligned.org/articles/partitioning-with-python 包含一篇关于序列分区的有趣帖子,这是他们使用的实现:
#!/usr/bin/env python
# From http://wordaligned.org/articles/partitioning-with-python
from itertools import chain, combinations
def sliceable(xs):
'''Return a sliceable version of the iterable xs.'''
try:
xs[:0]
return xs
except TypeError:
return tuple(xs)
def partition(iterable):
s = sliceable(iterable)
n = len(s)
b, mid, e = [0], list(range(1, n)), [n]
getslice = s.__getitem__
splits = (d for i in range(n) for d in combinations(mid, i))
return [[s[sl] for sl in map(slice, chain(b, d), chain(d, e))]
for d in splits]
if __name__ == '__main__':
s = "monkey"
for i in partition(s):
print i
哪个会打印:
['monkey']
['m', 'onkey']
['mo', 'nkey']
['mon', 'key']
['monk', 'ey']
['monke', 'y']
['m', 'o', 'nkey']
['m', 'on', 'key']
['m', 'onk', 'ey']
['m', 'onke', 'y']
['mo', 'n', 'key']
['mo', 'nk', 'ey']
['mo', 'nke', 'y']
['mon', 'k', 'ey']
['mon', 'ke', 'y']
['monk', 'e', 'y']
...
['mo', 'n', 'k', 'e', 'y']
['m', 'o', 'n', 'k', 'e', 'y']
【讨论】:
这个想法是要意识到字符串s的排列等于包含s本身的集合,以及s的每个子字符串X的集合并集与s\X 的排列。例如permute('key'):
{'key'} # 'key' itself{'k', 'ey'} # substring 'k' union 1st permutation of 'ey' = {'e, 'y'}{'k', 'e', 'y'} # substring 'k' union 2nd permutation of 'ey' = {'ey'}{'ke', 'y'} # substring 'ke' union 1st and only permutation of 'y' = {'y'}key 的所有排列。考虑到这一点,可以实现一个简单的算法:
>>> def permute(s):
result = [[s]]
for i in range(1, len(s)):
first = [s[:i]]
rest = s[i:]
for p in permute(rest):
result.append(first + p)
return result
>>> for p in permute('monkey'):
print(p)
['monkey']
['m', 'onkey']
['m', 'o', 'nkey']
['m', 'o', 'n', 'key']
['m', 'o', 'n', 'k', 'ey']
['m', 'o', 'n', 'k', 'e', 'y']
['m', 'o', 'n', 'ke', 'y']
['m', 'o', 'nk', 'ey']
['m', 'o', 'nk', 'e', 'y']
['m', 'o', 'nke', 'y']
['m', 'on', 'key']
['m', 'on', 'k', 'ey']
['m', 'on', 'k', 'e', 'y']
['m', 'on', 'ke', 'y']
['m', 'onk', 'ey']
['m', 'onk', 'e', 'y']
['m', 'onke', 'y']
['mo', 'nkey']
['mo', 'n', 'key']
['mo', 'n', 'k', 'ey']
['mo', 'n', 'k', 'e', 'y']
['mo', 'n', 'ke', 'y']
['mo', 'nk', 'ey']
['mo', 'nk', 'e', 'y']
['mo', 'nke', 'y']
['mon', 'key']
['mon', 'k', 'ey']
['mon', 'k', 'e', 'y']
['mon', 'ke', 'y']
['monk', 'ey']
['monk', 'e', 'y']
['monke', 'y']
【讨论】:
给定
import more_itertools as mit
s = "monkey"
演示
原样:
list(mit.partitions(s))
#[[['m', 'o', 'n', 'k', 'e', 'y']],
# [['m'], ['o', 'n', 'k', 'e', 'y']],
# [['m', 'o'], ['n', 'k', 'e', 'y']],
# [['m', 'o', 'n'], ['k', 'e', 'y']],
# [['m', 'o', 'n', 'k'], ['e', 'y']],
# [['m', 'o', 'n', 'k', 'e'], ['y']],
# ...]
加入一些字符串后:
[list(map("".join, x)) for x in mit.partitions(s)]
输出
[['monkey'],
['m', 'onkey'],
['mo', 'nkey'],
['mon', 'key'],
['monk', 'ey'],
['monke', 'y'],
['m', 'o', 'nkey'],
['m', 'on', 'key'],
['m', 'onk', 'ey'],
['m', 'onke', 'y'],
['mo', 'n', 'key'],
['mo', 'nk', 'ey'],
['mo', 'nke', 'y'],
['mon', 'k', 'ey'],
['mon', 'ke', 'y'],
['monk', 'e', 'y'],
['m', 'o', 'n', 'key'],
['m', 'o', 'nk', 'ey'],
['m', 'o', 'nke', 'y'],
['m', 'on', 'k', 'ey'],
['m', 'on', 'ke', 'y'],
['m', 'onk', 'e', 'y'],
['mo', 'n', 'k', 'ey'],
['mo', 'n', 'ke', 'y'],
['mo', 'nk', 'e', 'y'],
['mon', 'k', 'e', 'y'],
['m', 'o', 'n', 'k', 'ey'],
['m', 'o', 'n', 'ke', 'y'],
['m', 'o', 'nk', 'e', 'y'],
['m', 'on', 'k', 'e', 'y'],
['mo', 'n', 'k', 'e', 'y'],
['m', 'o', 'n', 'k', 'e', 'y']]
【讨论】:
面向字符串(与列表相反)的方法是认为每对相邻的字符由空格或空字符串分隔。这可以映射到 1 和 0,并且可能拆分的数量是 2 的幂:
2 ^ (len(s)-1)
例如,“key”可以用 '' 或 ' ' 分隔 'ke' 和一个 '' 或 ' ' 分隔 'ey',这会导致 4 种可能性:
一个不可读的python one liner,它给你一个字符串形式的生成器:
operator_positions = (''.join([str(a >> i & 1).replace('0', '').replace('1', ' ') + s[len(s)-1-i] for i in range(len(s)-1, -1, -1)]) for a in range(pow(2, len(s)-1)))
带有 cmets 和示例的此生成器的可读版本:
s = 'monkey'
s_length = len(s)-1 # represents the number of ' ' or '' that can split digits
operator_positions = (
''.join(
[str(a >> i & 1).replace('0', '').replace('1', ' ') + s[s_length-i]
for i in range(s_length, -1, -1)]) # extra digit is for blank string to always precede first digit
for a in range(pow(2, s_length)) # binary number loop
)
for i in operator_positions:
print i
str(a >> i & 1) 将 a 转换为二进制字符串,然后将其 0 和 1 分别替换为 '' 和 ' '。二进制字符串是一个额外的数字,所以第一个数字总是''。这样一来,由于数字分隔符与第一个字符组合,它总是只产生第一个字符。
【讨论】:
我的解决方案还允许您为子字符串的最小大小设置阈值
这是我的代码:
def split_string (s, min_str_length = 2, root_string=[], results=[] ):
"""
:param s: word to split, string
:param min_str_length: the minimum character for a sub string
:param root_string: leave empty
:param results: leave empty
:return: nested list of all possible combinations of word split according to the minimum substring length
"""
for i in range(min_str_length,len(s)):
if i == min_str_length:
primary_root_string=root_string
else:
root_string = primary_root_string
if len(s[i:])>= min_str_length :
results.append(list(chain(*[root_string,[s[:i]],[s[i:]]])))
root_string = list(chain(*[root_string,[s[:i]]]))
split_string(s[i:], min_str_length, root_string, results)
return results
使用示例:
Input: split_string ('monkey', min_str_length = 1, root_string=[], results=[] )
Output:
[['m', 'onkey'],
['m', 'o', 'nkey'],
['m', 'o', 'n', 'key'],
['m', 'o', 'n', 'k', 'ey'],
['m', 'o', 'n', 'k', 'e', 'y'],
['m', 'o', 'n', 'ke', 'y'],
['m', 'o', 'nk', 'ey'],
['m', 'o', 'nk', 'e', 'y'],
['m', 'o', 'nke', 'y'],
['m', 'on', 'key'],
['m', 'on', 'k', 'ey'],
['m', 'on', 'k', 'e', 'y'],
['m', 'on', 'ke', 'y'],
['m', 'onk', 'ey'],
['m', 'onk', 'e', 'y'],
['m', 'onke', 'y'],
['mo', 'nkey'],
['mo', 'n', 'key'],
['mo', 'n', 'k', 'ey'],
['mo', 'n', 'k', 'e', 'y'],
['mo', 'n', 'ke', 'y'],
['mo', 'nk', 'ey'],
['mo', 'nk', 'e', 'y'],
['mo', 'nke', 'y'],
['mon', 'key'],
['mon', 'k', 'ey'],
['mon', 'k', 'e', 'y'],
['mon', 'ke', 'y'],
['monk', 'ey'],
['monk', 'e', 'y'],
['monke', 'y']]
或
Input: split_string ('monkey', min_str_length = 2, root_string=[], results=[] )
Output: [['mo', 'nkey'], ['mo', 'nk', 'ey'], ['mon', 'key'], ['monk', 'ey']]
【讨论】: