【问题标题】:Generate combinations of string elements in a list, limiting them to a particular character amount在列表中生成字符串元素的组合,将它们限制为特定的字符数量
【发布时间】:2019-12-25 11:20:31
【问题描述】:

我想生成以下电影的 5 种组合,将它们限制为特定的字符数量。

films = ['Pulp Fiction','The Lion King','Reservoir Dogs','The Wolf of Wall Street','Jackie Brown','The Shawshank Redemption','Django Unchained','The Godfather','Gone Girl','The Dark Knight']

我打算让字符数量可变(本例假设为 50 个字符)。

预期结果

for i in film_combinations_limited:
    i[0] = ['The Shawshank Redemption, The Wolf of Wall Street'] (49 characters inc comma)
    i[1] = ['Pulp Fiction, Gone Girl, The Wolf of Wall Street'] (48 characters inc comma)
    i[2] = ['Reservoir Dogs, Pulp Fiction, The Dark Knight'] (45 characters inc comma)
    i[3] = ['Jackie Brown, Django Unchained, Pulp Fiction'] (44 characters inc comma)
    i[4] = ['The Wolf of Wall Street, The Lion King'] (38 characters inc comma)
    i[5] = ['Pulp Fiction, The Shawshank Redemption'] (38 characters inc comma)

希望充分利用字数限制,逗号空格也需要考虑字数限制。

当前代码

import itertools

x_raw=[el.split(' ') for el in films] 
x=[el for sublist in x_raw for el in sublist] #Not sure if I understood, what do you mean by "substring" - these 2 lines will produce substring ~ word

n=50 # character limit

res=[]
for i in range(len(x)):
   for obj in itertools.combinations(x, i+1):
      res_temp = " ".join(obj)
      #to ensure total number of characters <25 but it's high enough, that no other word from lorem/x will fit
      if((len(res_temp) < n) and (n-len(res_temp)<=min([len(el) for el in [el_x for el_x in x if el_x not in obj]] or [100]))): res.append(res_temp)

这会生成一个不包含逗号或空格的组合实例。我正在尝试实现尽可能多地填充字符限制的输出。

此代码的输出无关紧要,可以从列表更改。

如需更多信息/说明,请询问。

谢谢

【问题讨论】:

  • 为什么空格和逗号这么复杂?我认为您只需要遍历所有可能的组合,加入完整的电影名称(超过', ')并检查长度是否足够短。为什么要在空间等上分裂?
  • 您想要最长个可能的解决方案吗?
  • @Alfe 是的,这是所希望的

标签: python list loops while-loop combinations


【解决方案1】:

我认为您的解决方案使事情变得过于复杂。无需用空格等分隔电影名称。

import itertools

films = ['Pulp Fiction','The Lion King','Reservoir Dogs',
         'The Wolf of Wall Street','Jackie Brown','The Shawshank Redemption',
         'Django Unchained','The Godfather','Gone Girl','The Dark Knight']

def each_short_combination(films, max_length=50):
  for i in range(len(films)):
    yielded_something = False
    for combination in itertools.combinations(films, i):
      output = ', '.join(combination)
      if len(output) < max_length:
        yield output
        yielded_something = True
    if not yielded_something:  # nothing yielded with i movie names?
      break  # no need to try longer combinations then

answers = list(each_short_combination(films))
answers.sort(key=lambda x: len(x), reverse=True)
answers = answers[:5]

for answer in answers:
  print(answer, len(answer))

打印出来:

The Wolf of Wall Street, The Shawshank Redemption 49
Pulp Fiction, The Shawshank Redemption, Gone Girl 49
The Lion King, The Wolf of Wall Street, Gone Girl 49
Reservoir Dogs, Django Unchained, The Dark Knight 49
The Wolf of Wall Street, The Godfather, Gone Girl 49

【讨论】:

    【解决方案2】:

    这是一种方法,假设您希望选择最长的:

    from itertools import chain, combinations
    
    # Itertools recipe
    def powerset(iterable):
        s = list(iterable)
        return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
    
    def get_longest_combinations(names, num, max_length):
        # All combinations with their total length
        g = ((sum(map(len, c)) + 2 * len(c) - 1, c) for c in powerset(names))
        # Filter by total length and negate length for sorting
        g = ((s, c) for s, c in g if s <= max_length)
        # Pick longest ones
        longest = sorted(g)[:-num-1:-1]
        # Format
        return [', '.join(c) for _, c in longest]
    
    films = ['Pulp Fiction', 'The Lion King', 'Reservoir Dogs', 'The Wolf of Wall Street',
             'Jackie Brown', 'The Shawshank Redemption', 'Django Unchained', 'The Godfather',
             'Gone Girl', 'The Dark Knight']
    n = 50
    m = 5
    result = get_longest_combinations(films, m, n)
    print(*result, sep='\n')
    # The Wolf of Wall Street, The Shawshank Redemption
    # The Wolf of Wall Street, The Godfather, Gone Girl
    # The Lion King, The Wolf of Wall Street, Gone Girl
    # Reservoir Dogs, Django Unchained, The Dark Knight
    # Pulp Fiction, The Shawshank Redemption, Gone Girl
    

    你可以这样写:

    longest = sorted(g)[-num:]
    

    如果你使用堆来挑选最长的元素会更快:

    import heapq
    
    def max_n(it, n):
        it = iter(it)
        h = [next(it)]
        for elem in it:
            if len(h) < n:
                heapq.heappush(h, elem)
            elif elem > h[0]:
                heapq.heappop(h)
                heapq.heappush(h, elem)
        return sorted(h, reverse=True)
    
    # ...
    longest = max_n(g, num)
    

    如果名称个数的大小很长,则幂集的大小(2n)会太大。如果您想获得最佳组合,则无法真正“解决”该问题,但如果您停止探索无法成功的部分组合,则可以稍微减少搜索空间。你可以用这样的递归算法来做到这一点:

    import heapq
    
    def get_longest_combinations(names, num, max_length):
        h = []
        _get_longest_combinations_rec(names, num, max_length, h, [], -2, 0)
        return [', '.join(c) for _, c in sorted(h, reverse=True)]
    
    def _get_longest_combinations_rec(names, num, max_length, h, cur, cur_size, name_idx):
        if h and cur_size > h[0][0]:
            heapq.heappop(h)
        if len(h) < num:
            heapq.heappush(h, (cur_size, tuple(cur)))
        cur_size += 2
        for i in range(name_idx, len(names)):
            name = names[i]
            cur.append(name)
            cur_size += len(name)
            if cur_size < max_length:
                _get_longest_combinations_rec(
                    names, num, max_length, h, cur, cur_size, i + 1)
            cur_size -= len(name)
            cur.pop()
    
    films = ['Pulp Fiction', 'The Lion King', 'Reservoir Dogs', 'The Wolf of Wall Street',
             'Jackie Brown', 'The Shawshank Redemption', 'Django Unchained', 'The Godfather',
             'Gone Girl', 'The Dark Knight']
    n = 50
    m = 5
    result = get_longest_combinations(films, m, n)
    print(*result, sep='\n')
    # The Wolf of Wall Street, The Shawshank Redemption
    # The Wolf of Wall Street, The Godfather, Gone Girl
    # The Lion King, The Wolf of Wall Street, Gone Girl
    # Reservoir Dogs, Django Unchained, The Dark Knight
    # Pulp Fiction, The Shawshank Redemption, Gone Girl
    

    【讨论】:

    • @GJB 我已经将它重写为一个函数,看看是否有帮助。也是一个小修复。
    • @Alfe 是的 return sorted(h, reverse=True) 可以替换为 heapq.nlargest(len(h), h) 如果愿意的话。
    • 此解决方案还会尝试所有包含五个电影名称的组合,即使包含四个电影名称的所有组合已经太长了。在这种情况下这不是问题,但如果您有更长的电影列表,这肯定会是一个巨大的(并且很容易避免的)问题。对于一千部电影,此解决方案不会在合理的时间内终止。
    • @Alfe 是的,您的解决方案对此进行了一些优化,尽管它可以修剪更多案例,例如如果A, B 已经太长,我不应该尝试A, B, CA, B, C, D 等。我为此添加了一个递归算法。
    • 我只是想确保一千个电影名称仍然在可接受的时间内终止。只有当其他三个组合足够短时才测试 ABCD。但您是对的,在某些极端情况下,我的解决方案仍然浪费的时间远远多于您的需求。
    【解决方案3】:

    您已经在使用combinations,这是一个完美的工具。您可以将所有内容简化为列表理解:

    from itertools import combinations
    
    def combo(l):
        result = sorted([", ".join(y) for i in range(len(l)) for y in combinations(l,i) if len(", ".join(y))<50],
                        key=lambda s: len(s), reverse=True)
        for i in result[:5]: #get the best 5 result only
            print (i, len(i))
    
    combo(films)
    
    #The Wolf of Wall Street, The Shawshank Redemption 49
    #Pulp Fiction, The Shawshank Redemption, Gone Girl 49
    #The Lion King, The Wolf of Wall Street, Gone Girl 49
    #Reservoir Dogs, Django Unchained, The Dark Knight 49
    #The Wolf of Wall Street, The Godfather, Gone Girl 49
    

    【讨论】:

    • 我没有否决它(基本上它就像我的解决方案)。我发现有问题的唯一方面是当内部循环不再产生任何东西时,您的外部循环会继续。因此,当四部电影组合都已经太长时,您还要检查五部电影组合。
    • 也许有人不喜欢变量名称l,它在许多字体中看起来像1,因此恕我直言,永远不要使用。或者其他使代码难以理解的单字母变量。或者不喜欢长线。或打印而不是返回/产生结果。但这是猜测。我最喜欢你的解决方案。
    • 是的,我只是想确保它是一个正确的答案,否则我会修复它。否则我必须删除它,即使它实际上是有效的。
    • 此解决方案还会尝试所有包含五个电影名称的组合,即使包含四个电影名称的所有组合已经太长了。在这种情况下这不是问题,但如果您有更长的电影列表,这肯定会是一个巨大的(并且很容易避免的)问题。对于一千部电影,此解决方案不会在合理的时间内终止。
    猜你喜欢
    • 2017-09-28
    • 2020-09-06
    • 1970-01-01
    • 2015-04-08
    • 2018-05-16
    • 2023-03-09
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多