尝试索引字符串列表并根据其索引删除字符串答案

【问题标题】：Trying to index a list of strings and delete a string based on their index尝试索引字符串列表并根据其索引删除字符串
【发布时间】：2019-12-22 18:02:52
【问题描述】：

我有一个列表列表（称为副本），其中每个列表中的元素（在大列表中）是代表某些电影的字符串（如下所示）：

[['history', '1960', 'action'],
 ['1960', 'western', 'adventure'],
 ['3d', 'fantasy'],
 ['agent', 'action', 'adventure'], 
....]

其中一些词代表电影类型。我要做的是，对于每个列表，查找流派的单词（通过查看这些单词是否在名为 set_genres 的集合中），将它们放在列表的开头并在其后附加单词“电影” .如果列表中有多个流派，我只想在最后一个流派之后附加“电影”一词。 Set_genres 和所需的输出如下：

set_genres={'action',
 'adventure',
 'animation',
 'comedy',
 'crime',
 'documentary',
 'drama',
 'family',
 'fantasy',
 'foreign',
 'history',
 'horror',
 'music',
 'mystery',
 'romance',
 'science_fiction',
 'thriller',
 'tv_movie',
 'war',
 'western'}

#Output
[['history','action movie', '1960'],
 ['western','adventure movie', '1960'],
 ['fantasy movie','3d'],
 ['action', 'adventure movie', 'agent'], 
....]

我用来尝试实现此目的的代码如下：

keys=[]
for list_top in copy:
        for idx, word in enumerate(list_top):
                if word in set_genres:
                        keys.append((idx,word))
        keys.sort(reverse=True)
        for idx, word in keys:
                del list_top[idx]
        for idx, word in keys:
                if idx==len(keys)-1:
                        list_top.insert(0,'{} movie'.format(word))
                else:
                        list_top.insert(0,word)

但是，这不起作用，我无法弄清楚原因。它给了我以下错误：

indexes=[]...
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
 in 
      8                         keys.sort(reverse=True)
      9                 for idx, word in keys:
---> 10                         del list_top[idx]
     11                 for idx, word in keys:
     12                         if idx==len(keys)-1:

IndexError: list assignment index out of range

如果有人知道可能出了什么问题，我会很感激你的帮助！

【问题讨论】：

请同时发布与提供的示例输入列表相关的set_genres 示例。
不要修改您正在迭代的列表。
@anky_91 我在上面添加了！
@DanielRoseman 你能告诉我为什么吗？而且，副本已经是原始列表的深度副本
@J.Doe，所需输出中的第二项未排序['western','action movie', '1960'] - 这是有意的吗？流派的顺序重要吗？

标签： python string pandas list indexing

【解决方案1】：

扩展的sorted 方法，优化了逆向遍历：

genres_set = {'action', 'adventure', 'animation', 'comedy', 'crime', 'documentary', 'drama', 'family',
              'fantasy', 'foreign', 'history', 'horror', 'music', 'mystery', 'romance', 'science_fiction',
              'thriller', 'tv_movie', 'war', 'western'}
inp_list = [['history', '1960', 'action'],
            ['1960', 'western', 'adventure'],
            ['3d', 'fantasy'],
            ['agent', 'action', 'adventure']
            ]
genres_res = [sorted(lst, key=lambda x: x in genres_set, reverse=True) for lst in inp_list]
for lst in genres_res:
    for i, genre in enumerate(lst[::-1]):
        if genre in genres_set:
            lst[-i-1] += ' movie'   # updating the last genre in sublist
            break
print(genres_res)

输出：

[['history', 'action movie', '1960'], ['western', 'adventure movie', '1960'], ['fantasy movie', '3d'], ['action', 'adventure movie', 'agent']]

替代方法可以使用 generator 函数：

def arrange_genres(inp_list):
    for lst in inp_list:
        lst = sorted(lst, key=lambda x: x in genres_set, reverse=True)
        for i, genre in enumerate(lst[::-1]):
            if genre in genres_set:
                lst[-i - 1] += ' movie'
                break
        yield lst

res = list(arrange_genres(inp_list))

【讨论】：

【解决方案2】：

由于pandas被标记，这里有一种使用np和pd的方法：

df=pd.DataFrame(l)

         0        1          2
0  history     1960     action
1     1960  western  adventure
2       3d  fantasy       None
3    agent   action  adventure

条件：

c1=df.ffill(1).iloc[:,-1].isin(set_genres) #check if the last element isin set_genres
c2=df.eq(df.ffill(1).iloc[:,-1],axis=0) #check where it matches the df elements
c3=df.isna() #check for None

选择：

choice1=df.mask(c2,df.astype(str)+' movie') #mask c1 and add movie to the elements
choice2=''

然后np.sort 和np.select

pd.DataFrame(np.sort(np.select([c1[:,None]&c2,c3],[choice1,choice2],default=df)).T[::-1].T)

               0                1       2
0        history     action movie    1960
1        western  adventure movie    1960
2  fantasy movie               3d        
3          agent  adventure movie  action

【讨论】：

【解决方案3】：

你可以使用列表推导

for i,list_top in enumerate(copy):
    temp = [x for x in list_top if x in set_genres]
    temp[-1]=temp[-1]+' movie'
    copy[i] = temp + [x for x in list_top if x not in set_genres]

print(copy)

>>output
[['history', 'action movie', '1960'], ['western', 'adventure movie', '1960'], ['fantasy movie', '3d'], ['action', 'adventure movie', 'agent']]

【讨论】：

【解决方案4】：

关于你的错误：

您正在修改您正在迭代的列表。如果这样做，列表的大小会缩小，因此最终这将超出列表的边界。

这是你需要的：

copy = [['history', '1960', 'action'],
 ['1960', 'western', 'adventure'],
 ['3d', 'fantasy'],
 ['agent', 'action', 'adventure']]

set_genres={'action',
 'adventure',
 'animation',
 'comedy',
 'crime',
 'documentary',
 'drama',
 'family',
 'fantasy',
 'foreign',
 'history',
 'horror',
 'music',
 'mystery',
 'romance',
 'science_fiction',
 'thriller',
 'tv_movie',
 'war',
 'western'}

for ind_copy, list_top in enumerate(copy):
  word_finded = False
  print(list(reversed(list_top)))
  for ind_list_top, word  in enumerate(list(reversed(list_top))):
      if not word_finded:
        if word in set_genres:
          list_top[len(list_top) - ind_list_top - 1] = '{} movie'.format(word)
          word_finded = True
  if word_finded:
    copy[ind_copy] = list_top

print(copy)

【讨论】：

感谢您的帮助！这确实添加了“电影”一词。但是，当有 2 种类型时，它会将“电影”一词添加到它们中，而不仅仅是添加到最后一个中。此外，它不会像我想要的那样将流派放在列表的开头
@J.Doe 你想先改变字母开头的单词吗？
对于流派的单词，我想将它们附加到列表的开头。然后，如果有超过 1 个流派，我只想将“电影”这个词附加到最后一个流派
@J.Doe 如你所说改变了

【解决方案5】：

类似这样的：

set_genres={'action',
 'adventure',
 'animation',
 'comedy',
 'crime',
 'documentary',
 'drama',
 'family',
 'fantasy',
 'foreign',
 'history',
 'horror',
 'music',
 'mystery',
 'romance',
 'science_fiction',
 'thriller',
 'tv_movie',
 'war',
 'western'}

base = [['history', '1960', 'action'],
 ['1960', 'western', 'adventure'],
 ['3d', 'fantasy'],
 ['agent', 'action', 'adventure']]

print(set_genres)
print(base)

for movie in base:
    for s in movie:
        if s not in set_genres:
            movie.remove(s)
            movie.append(s)


print(base)

输出：

[['history', 'action', '1960'], ['western', 'adventure', '1960'], ['fantasy', '3d'], ['action', 'adventure', 'agent']]

【讨论】：

谢谢！这确实添加了“电影”一词，但是当有两种类型时，它会将“电影”一词添加到它们中，而不仅仅是最后一个。此外，它不会像我想要的那样将流派放在列表的开头
您能否提供带有单词“电影”的示例输入列表，更具体地说？然后我会更新我的代码。

【解决方案6】：

对@Дмитрий Сиденко 的建议进行了一些修改：

for ind_copy, list_top in enumerate(copy):
   keys=[]
   for ind_list_top, word  in enumerate(list_top):
      if word in set_genres:
         keys.append(word)
         del list_top[ind_list_top]
   keys[-1] = '{} movie'.format(keys[-1])
   copy[ind_copy] = keys + list_top

【讨论】：

这样我也得到了 IndexError: list index out of range 并且它指向 'keys[-1]='{} movie'.format(keys[-1])' 行
那么我们需要在追加电影之前检查keys是否为空。你能试试这个：`if(len(keys)>0): keys[-1] = '{} movie'.format(keys[-1]) copy[ind_copy] = keys + list_top