查找集合的所有不相交子集，尊重元素顺序答案

【问题标题】：Find all disjoint subsets of a set, respecting the element order查找集合的所有不相交子集，尊重元素顺序
【发布时间】：2015-11-20 22:18:05
【问题描述】：

假设我想在 Python 2.7 中实现一个解决方案。

我有一个字符串列表，例如A = ['AA', 'BB', 'CC', 'DD']。

所需的输出将是 A 的一组不相交的子集，例如 A_1、A_2 ... A_N，这样

(A_1 U A_2 U ... U A_N) = A,

(A_1 ∩ A_2 ∩ ... ∩ A_N) = Ø,

同时尊重 A 中元素的顺序（A_1、A_2、...、A_N 不能包含 A 中不相邻的元素）。

对于 A，这些将是：

A_1，A_2 ... A_N：

['AA', 'BB', 'CC', 'DD'], Ø
['AA'], ['BB', 'CC', 'DD']
['AA', 'BB'], ['CC', 'DD']
['AA', 'BB', 'CC'], ['DD']
['AA']、['BB']、['CC']、['DD']
['AA', 'BB'], ['CC'], ['DD']
['AA']、['BB'、'CC']、['DD']
['AA'], ['BB'], ['CC', 'DD']

（希望我没有遗漏任何内容，但我想你明白了）

重点是高效 - 意味着相对较快且不太浪费内存。我知道对于更大的列表，组合的数量可能会激增，但我的列表永远不会超过 5 个元素。

【问题讨论】：

标签： python list python-2.7 subset

【解决方案1】：

我找到了类似问题here 的答案，唯一的区别是我想要所有子集，而它们只需要最大长度为 2 的子集。

解决方案相当于找到所有可能的整数组合，总和为 n（输入列表的长度），然后将解决方案重新映射到单词列表以找到它的子集。

他们的问题的伪代码：

push an empty list onto the stack;
while (the stack is not empty) {
  pop the top list off the stack;
  if (the sum of its entries is n)
    add it to the solution set;
  else if (the sum of its entries is less than n)
    add a 1 to a copy of the list and push it onto the stack;
    add a 2 to a copy of the list and push it onto the stack;
  }
}

这个问题的伪代码（扩展）：

push an empty list onto the stack;
while (the stack is not empty) {
  pop the top list off the stack;
  if (the sum of its entries is n)
    add it to the solution set;
  else if (the sum of its entries is less than n)
    for j = 1:n {
      add j to a copy of the list and push it onto the stack;
      }
  }
}

我的 Python 实现：

import copy

def generate_subsets(words):

    # get length of word list
    list_len = len(words)

    # initialize stack, subset_lens list
    stack = [[], ]
    subset_lens = []

    while stack:
        current_item = stack.pop(-1)
        if sum(current_item) == list_len:
            subset_lens.append(current_item)
        elif sum(current_item) < list_len:
            for j in range(1, list_len+1):
                new_item = copy.deepcopy(current_item)
                new_item.append(j)
                stack.append(new_item)

    # remap subset lengths to actual word subsets
    subsets = []

    for subset_len in subset_lens:
        subset = []
        starting_index = 0
        for index in subset_len:
            subset.append('_'.join(words[starting_index:starting_index+index]))
            starting_index+= index
        subsets.append(subset)

    return subsets

输入：

generate_subsets(['AA', 'BB', 'CC', 'DD'])

输出：

['AA_BB_CC_DD']
['AA_BB_CC', 'DD']
['AA_BB', 'CC_DD']
['AA_BB', 'CC', 'DD']
['AA', 'BB_CC_DD']
['AA', 'BB_CC', 'DD']
['AA', 'BB', 'CC_DD']
['AA', 'BB', 'CC', 'DD']

如果有人找到更有效的解决方案，我很高兴在答案/cmets 中看到它！

【讨论】：