【问题标题】:How to split a list into sublists that begin with the delimiting character? [closed]如何将列表拆分为以分隔符开头的子列表? [关闭]
【发布时间】:2020-10-22 20:39:01
【问题描述】:
  • 我想将列表拆分为以分隔符开头的子列表
    • 必须保留分隔符
    • 分隔符必须是每个子列表的第一个字符

例子:

delimiter = "x" 
input = ["x","a","x","x",1,2,3,"a","a","x","e"]
output = [["x","a"], ["x"], ["x",1,2,3,"a","a"], ["x","e"]]

【问题讨论】:

    标签: python list split


    【解决方案1】:

    第 1 步:查找列表中出现 variable 的索引:

    idx = [ix for ix, val in enumerate(input) if val==variable]
    

    第 2 步:使用列表切片生成子列表:

    res = [input[i:j] for i,j in zip(idx, idx[1:]+[len(input)])]
    

    输出

    print(res)
    # [['x', 'a'], ['x'], ['x', 1, 2, 3, 'a', 'a'], ['x', 'e']]
    

    【讨论】:

    • 你认为它可以作为一个以“x”为键的字典来完成吗?
    • 键“x”的值是多少?
    • @RostyKoryaha 字典键是唯一的,只能有一个 'x' 键。您的预期输出会是什么样子?
    • 空字符串""
    • 您为什么要这样做?我不明白那个用例。
    【解决方案2】:

    您可以使用以下函数来获得所需的输出。

    试试这个:

    def splitList(inputList, delim):
        finalList = []
        chunk = []
        for val in inputList:
            if val == delim:
                finalList.append(chunk)
                chunk = [delim]
            else:
                chunk.append(val)
        finalList.append(chunk)        
        return finalList[1:]
    
    
    my_input_list = ["x","a","x","x",1,2,3,"a","a","x","e"]
    my_output = splitList(my_input_list, "x")    #just call the function with input_list and delimiter to split onto.
    
    
    print(my_output)
    
    >>> [["x","a"],["x"],["x",1,2,3,"a","a"],["x","e"]]
    

    【讨论】:

      【解决方案3】:

      使用堆栈

      variable = "x" 
      input1 = ["x","a","x","x",1,2,3,"a","a","x","e"]
      
      sol = []
      tmp = []
      for char in input1:
          if char==variable and tmp:
              sol.append(tmp)
              tmp = [char]
          else:
              tmp.append(char)
      if tmp:
          sol.append(tmp)
          
      print(sol)
      

      输出

       [['x', 'a'], ['x'], ['x', 1, 2, 3, 'a', 'a'], ['x', 'e']]
      

      【讨论】:

        【解决方案4】:

        使用itertools.groupby

        例如:

        from itertools import groupby
        
        variable = "x" 
        data = ["x","a","x","x",1,2,3,"a","a","x","e"]
        output = []
        for k, v in groupby(data, lambda x: x==variable):   #--->[(True, ['x']), (False, ['a']), (True, ['x', 'x']), (False, [1, 2, 3, 'a', 'a']), (True, ['x']), (False, ['e'])]
            v = list(v)
            if k:
                for i in v:
                    output.append([i])
            else:
                output[-1].extend(v)
        print(output)
        

        输出:

        [['x', 'a'], ['x'], ['x', 1, 2, 3, 'a', 'a'], ['x', 'e']]
        

        【讨论】:

          【解决方案5】:
          • input是python的方法,不要用它作为变量名
          • 如果列表中的第一个字符不是分隔符,此解决方案也可以使用
            • 给定:['a', 'b', 'c', 'x', 'a', 'x', 'x', 1, 2, 3, 'a', 'a', 'x', 'e', 'x']
            • 返回:[['a', 'b', 'c'], ['x', 'a'], ['x'], ['x', 1, 2, 3, 'a', 'a'], ['x', 'e'], ['x']]
          from typing import List  # for type annotations
          
          
          def sublist_by_delimiter(flat_list: list, delimiter: str) -> List[list]:
              result = list()  # main list
              chunk = list()  # inner list to 
              len_flat_list = len(flat_list)
              for i, v in enumerate(flat_list, 1):  # iterate through t, begin enumerating at 1
                  if (v == delimiter) & (i != 1):  # except for the first delimiter 
                      result.append(chunk)  # append chunk to result
                      chunk = [v]  # create new chunk beginning with v
                      if i == len_flat_list:  # if the last value in the list is delimiter
                          result.append(chunk)
                  elif (i == len_flat_list):  # for the last list in lines
                      chunk.append(v)  # append that line to inner
                      result.append(chunk)  # append chunk to result
                  else:
                      chunk.append(v)  # append each v to chunk where v isn't delimiter
                      
              return result
                      
          
          t = ['x', 'a', 'x', 'x', 1, 2, 3, 'a', 'a', 'x', 'e', 'x']  # an extra x has been added at the end for testing
          delim = 'x'
          sublist_by_delimiter(t, delim)
          [['x', 'a'], ['x'], ['x', 1, 2, 3, 'a', 'a'], ['x', 'e'], ['x']]
          

          使用collections.defaultdict

          • 从 python 3.7 开始,dicts 保证根据插入进行排序,因此dict.values() 将在返回时进行排序。
          • 此解决方案对于任何想要拥有片段字典的人来说都是一个不错的选择
            • return list(dd.values()) 更改为return dd
          • 如果列表中的第一个字符不是分隔符,此解决方案也可以使用
            • 给定:['a', 'b', 'c', 'x', 'a', 'x', 'x', 1, 2, 3, 'a', 'a', 'x', 'e', 'x']
            • 返回:[['a', 'b', 'c'], ['x', 'a'], ['x'], ['x', 1, 2, 3, 'a', 'a'], ['x', 'e'], ['x']]
          from collection import defaultdict
          
          def sublist_by_delimiter(flat_list: list, delimiter: str) -> List[list]:
              dd = defaultdict(list)
              counter = 0
              for v in flat_list:
                  if v == delimiter:
                      counter += 1
                      dd[counter].append(v)
                  else:
                      dd[counter].append(v)
              return list(dd.values())
          
          
          sublist_by_delimiter(t, 'x')
          [['x', 'a'], ['x'], ['x', 1, 2, 3, 'a', 'a'], ['x', 'e'], ['x']]
          

          使用dict

          • 3.61 s ± 9.85 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 用于 25M 元素列表
            • defaultdict:3.74 s ± 53.7 ms
          • 如前所述,如果第一个字符不是分隔符,此解决方案将生成 KeyError
          def sublist_by_delimiter(flat_list: list, delimiter: str) -> List[list]:
              dd = dict(list)
              counter = 0
              for v in flat_list:
                  if v == delimiter:
                      counter += 1
                      if dd.get(counter) == None:
                          dd[counter] =  [v]
                  else:
                      dd[counter].append(v)
              return list(dd.values())
          

          【讨论】:

            【解决方案6】:

            您可以使用re 尝试这种简单的方法:

            import re
            variable = "x" 
            inp = ["x","a","x","x",1,2,3,"a","a","x","e"]
            st=''.join(list(map(str,inp)))
            regex=f'({variable}[^{variable}]*)'
            ls=[[k if not k.isdigit() else int(k) for k in l] for l in re.findall(regex,st)]
            print(ls)
            

            输出:

            ls
            [['x', 'a'], ['x'], ['x', 1, 2, 3, 'a', 'a'], ['x', 'e']]
            

            【讨论】:

              【解决方案7】:

              你可以使用双指针,试试这个:

              var = "x"
              l = ["x", "a", "x", "x", 1, 2, 3, "a", "a", "x", "e"]
              r = []
              left, right = 0, 0
              while left < len(l)-1:
                  left = right
                  if l[left] == var:
                      right += 1
                      if right < len(l)-1:
                          while l[right] != "x":
                              right += 1
                      else:
                          right = len(l)
                      r.append(l[left:right])
                  left += 1
              print(r)
              

              【讨论】:

                【解决方案8】:

                试试:

                indices = [i for i, v in enumerate(input) if v =='x']
                res = [input[i:j] for i, j in zip([0]+indices, indices+[None])][1:]
                res
                

                [['x', 'a'], ['x'], ['x', 1, 2, 3, 'a', 'a'], ['x', 'e']]
                

                使用itertools's groupby

                class get_indices(object):
                    def __init__(self, value):
                        self.value = value
                        self.i = 0
                
                    def __call__(self, value):  # For masking
                        self.i += (value == self.value)
                        return self.i
                
                res = [list(g) for _, g in groupby(input, key=get_indices('x'))]
                res
                

                [['x', 'a'], ['x'], ['x', 1, 2, 3, 'a', 'a'], ['x', 'e']]
                

                【讨论】:

                  【解决方案9】:

                  这篇文章是 %%timeit 对截至本文发布时可用的解决方案的比较

                  • 感谢所有回答的人,感谢您的贡献。有一些很好的答案。我今天早些时候(2020-07-01)正在处理一个类似的问题,但是有一个列表列表,其中分隔符在间歇列表中的索引 0 处。我认为解决这个问题并非易事。
                  • 既然有这么多的解决方案,我认为比较会有所帮助
                  • 样本数据列表l有25_000_000个元素,第一个值为'x'
                  • 此测试中的所有函数都正确返回[['x', 'a'], ['x'], ['x', 1, 2, 3, 'a', 'a'], ['x', 'e'], ['x']],对于['x', 'a', 'x', 'x', 1, 2, 3, 'a', 'a', 'x', 'e', 'x']

                  测试数据

                  import random
                  
                  random.seed(25)
                  l = [random.choice(['x', 'a', 'e', 1, 2, 3]) for _ in range(25000000)]
                  l[0] = 'x'
                  print(f'Length of list l: {len(l)}')
                  print(f'First 10 values of list l: {l[:10]}')
                  
                  Length of list l: 25000000
                  First 10 values of list l: ['x', 'x', 'a', 'e', 3, 1, 'x', 'e', 'x', 'e']
                  

                  %%timeit 测试

                  %%timeit
                  pygirl(l)
                  2.75 s ± 14.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
                  
                  %%timeit
                  pygirl2(l)
                  9 s ± 79.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
                  
                  %%timeit
                  trenton(l)
                  4.78 s ± 36.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
                  
                  %%timeit
                  trenton2(l)
                  3.74 s ± 53.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
                  
                  %%timeit
                  trenton3(l)
                  3.6 s ± 16.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
                  
                  %%timeit
                  mrnobody33(l)
                  9.68 s ± 37.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
                  
                  %%timeit
                  rakesh(l)
                  5.78 s ± 91.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
                  
                  %%timeit
                  ansfourtytwo(l)
                  2.69 s ± 26 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
                  
                  %%timeit
                  sahasrara62(l)
                  2.63 s ± 27.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
                  
                  %%timeit
                  prashant(l)
                  2.64 s ± 9.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
                  
                  %%timeit
                  kevin(l)
                  # results in
                  ---------------------------------------------------------------------------
                  IndexError                                Traceback (most recent call last)
                  <ipython-input-11-270e2f7daf8d> in kevin(test_list)
                      101             right += 1
                      102             if right < len(test_list)-1:
                  --> 103                 while test_list[right] != "x":
                      104                     right += 1
                      105             else:
                  
                  IndexError: list index out of range
                  

                  功能

                  def pygirl(test_list):
                      indices = [i for i, v in enumerate(test_list) if v =='x']
                      return [test_list[i:j] for i, j in zip([0]+indices, indices+[None])][1:]
                  
                  
                  class get_indices(object):
                      def __init__(self, value):
                          self.value = value
                          self.i = 0
                  
                      def __call__(self, value):  # For masking
                          self.i += (value == self.value)
                          return self.i
                      
                      
                  def pygirl2(test_list):
                      return [list(g) for _, g in groupby(test_list, key=get_indices('x'))]
                  
                  
                  def trenton(test_list):
                      result = list()
                      chunk = list()
                      delimiter = 'x'
                      len_test_list = len(test_list)
                      for i, v in enumerate(test_list, 1):
                          if (v == delimiter) & (i != 1):
                              result.append(chunk)
                              chunk = [v]
                              if i == len_test_list:
                                  result.append(chunk)
                          elif (i == len_test_list):
                              chunk.append(v)
                              result.append(chunk)
                          else:
                              chunk.append(v)       
                      return result
                  
                  
                  def trenton2(test_list):
                      dd = defaultdict(list)  # defaultdict
                      delim = 'x'
                      counter = 0
                      for v in test_list:
                          if v == delim:
                              counter += 1
                              dd[counter].append(v)
                          else:
                              dd[counter].append(v)
                      return list(dd.values())
                  
                  
                  def trenton3(test_list):
                      dd = dict()  # regular dict
                      delim = 'x'
                      counter = 0
                      for v in test_list:
                          if v == delim:
                              counter += 1
                              if dd.get(counter) == None:
                                  dd[counter] = [v]
                          else:
                              dd[counter].append(v)
                          
                      return list(dd.values())
                  
                  
                  def mrnobody33(test_list):
                      variable = "x" 
                      st=''.join(list(map(str,test_list)))
                      regex=f'({variable}[^{variable}]*)'
                      return [[k if not k.isdigit() else int(k) for k in v] for v in re.findall(regex,st)]
                  
                  
                  def rakesh(test_list):
                      variable = "x" 
                      output = []
                      for k, v in groupby(test_list, lambda x: x==variable):
                          v = list(v)
                          if k:
                              for i in v:
                                  output.append([i])
                          else:
                              output[-1].extend(v)
                      return output
                  
                  
                  def ansfourtytwo(test_list):
                      variable = 'x'
                      idx = [ix for ix, val in enumerate(test_list) if val==variable]
                      return [test_list[i:j] for i,j in zip(idx, idx[1:]+[len(test_list)])]
                  
                  
                  def sahasrara62(test_list):
                      variable = "x" 
                      sol = []
                      tmp = []
                      for char in test_list:
                          if char==variable and tmp:
                              sol.append(tmp)
                              tmp = [char]
                          else:
                              tmp.append(char)
                      if tmp:
                          sol.append(tmp)
                      return sol
                  
                  
                  def prashant(test_list):
                      delim = 'x'
                      finalList = []
                      chunk = []
                      for val in test_list:
                          if val == delim:
                              finalList.append(chunk)
                              chunk = [delim]
                          else:
                              chunk.append(val)
                      finalList.append(chunk)        
                      return finalList[1:]
                  
                  
                  def kevin(test_list):
                      var = "x"
                      r = []
                      left, right = 0, 0
                      while left < len(test_list)-1:
                          left = right
                          if test_list[left] == var:
                              right += 1
                              if right < len(test_list)-1:
                                  while test_list[right] != "x":
                                      right += 1
                              else:
                                  right = len(test_list)
                              r.append(test_list[left:right])
                          left += 1
                      return r
                  

                  【讨论】:

                    猜你喜欢
                    • 2019-03-27
                    • 1970-01-01
                    • 1970-01-01
                    • 1970-01-01
                    • 2021-01-12
                    • 2018-10-24
                    • 1970-01-01
                    • 2019-04-08
                    • 1970-01-01
                    相关资源
                    最近更新 更多