【问题标题】:Find all disjoint subsets of a set, respecting the element order查找集合的所有不相交子集,尊重元素顺序
【发布时间】:2015-11-20 22:18:05
【问题描述】:

假设我想在 Python 2.7 中实现一个解决方案。

我有一个字符串列表,例如A = ['AA', 'BB', 'CC', 'DD']。

所需的输出将是 A 的一组不相交的子集,例如 A_1、A_2 ... A_N,这样

(A_1 U A_2 U ... U A_N) = A,

(A_1 ∩ A_2 ∩ ... ∩ A_N) = Ø, 

同时尊重 A 中元素的顺序(A_1、A_2、...、A_N 不能包含 A 中不相邻的元素)。

对于 A,这些将是:

A_1,A_2 ... A_N:

  • ['AA', 'BB', 'CC', 'DD'], Ø
  • ['AA'], ['BB', 'CC', 'DD']
  • ['AA', 'BB'], ['CC', 'DD']
  • ['AA', 'BB', 'CC'], ['DD']
  • ['AA']、['BB']、['CC']、['DD']
  • ['AA', 'BB'], ['CC'], ['DD']
  • ['AA']、['BB'、'CC']、['DD']
  • ['AA'], ['BB'], ['CC', 'DD']

(希望我没有遗漏任何内容,但我想你明白了)

重点是高效 - 意味着相对较快且不太浪费内存。我知道对于更大的列表,组合的数量可能会激增,但我的列表永远不会超过 5 个元素。

【问题讨论】:

    标签: python list python-2.7 subset


    【解决方案1】:

    我找到了类似问题here 的答案,唯一的区别是我想要所有子集,而它们只需要最大长度为 2 的子集。

    解决方案相当于找到所有可能的整数组合,总和为 n(输入列表的长度),然后将解决方案重新映射到单词列表以找到它的子集。

    他们的问题的伪代码:

    push an empty list onto the stack;
    while (the stack is not empty) {
      pop the top list off the stack;
      if (the sum of its entries is n)
        add it to the solution set;
      else if (the sum of its entries is less than n)
        add a 1 to a copy of the list and push it onto the stack;
        add a 2 to a copy of the list and push it onto the stack;
      }
    }
    

    这个问题的伪代码(扩展):

    push an empty list onto the stack;
    while (the stack is not empty) {
      pop the top list off the stack;
      if (the sum of its entries is n)
        add it to the solution set;
      else if (the sum of its entries is less than n)
        for j = 1:n {
          add j to a copy of the list and push it onto the stack;
          }
      }
    }
    

    我的 Python 实现:

    import copy
    
    def generate_subsets(words):
    
        # get length of word list
        list_len = len(words)
    
        # initialize stack, subset_lens list
        stack = [[], ]
        subset_lens = []
    
        while stack:
            current_item = stack.pop(-1)
            if sum(current_item) == list_len:
                subset_lens.append(current_item)
            elif sum(current_item) < list_len:
                for j in range(1, list_len+1):
                    new_item = copy.deepcopy(current_item)
                    new_item.append(j)
                    stack.append(new_item)
    
        # remap subset lengths to actual word subsets
        subsets = []
    
        for subset_len in subset_lens:
            subset = []
            starting_index = 0
            for index in subset_len:
                subset.append('_'.join(words[starting_index:starting_index+index]))
                starting_index+= index
            subsets.append(subset)
    
        return subsets
    

    输入:

    generate_subsets(['AA', 'BB', 'CC', 'DD'])
    

    输出:

    ['AA_BB_CC_DD']
    ['AA_BB_CC', 'DD']
    ['AA_BB', 'CC_DD']
    ['AA_BB', 'CC', 'DD']
    ['AA', 'BB_CC_DD']
    ['AA', 'BB_CC', 'DD']
    ['AA', 'BB', 'CC_DD']
    ['AA', 'BB', 'CC', 'DD']
    

    如果有人找到更有效的解决方案,我很高兴在答案/cmets 中看到它!

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-03-14
      • 2023-03-19
      相关资源
      最近更新 更多