如何构建文件中每个单词后面的所有单词的列表？答案

【问题标题】：How can I build a list of all words that follow each word in a file?如何构建文件中每个单词后面的所有单词的列表？
【发布时间】：2017-03-26 04:07:42
【问题描述】：

我正在尝试使用马尔可夫链构建随机句子生成器，但是在尝试构建文件中每个单词之后的单词列表时遇到了问题。我一直在尝试使用的代码是：

word_list = [spot+1 for spot in words if spot == word]

我尝试过变体，例如：

word_list = [words[spot+1] for spot in words if spot == word]

但每次，我都会收到错误：

TypeError: Can't convert 'int' object to str implicitly

如何正确地将单词添加到给定单词之后的列表中？我觉得有一个我没有想到的显而易见的解决方案。

【问题讨论】：

spot 是一个字符串吗？如果是这样，你想通过添加1 来完成什么？
Spot 是一个字符串，我给它加 1 以获取列表中紧随其后的单词。
你只是告诉它在字符串中添加 1，而不是它在列表中的索引。所以你必须写word_list = [words[word_list.index(spot) + 1] for spot in words if spot == word]
@n1c9 如果word 在输入中出现多次，则将不起作用，因为index(spot) 将始终返回第一次出现的索引。

标签： python list python-3.x next

【解决方案1】：

诀窍是迭代成对，而不是单个单词：

words = ['the', 'enemy', 'of', 'my', 'enemy', 'is', 'my', 'friend']
word = 'my'

[next_word for this_word, next_word in zip(words, words[1:]) if this_word == word]

结果：

['enemy', 'friend']

这种方法依赖于 Python 的 zip() 函数和切片。

words[1:] 是words 的一个副本，它错过了第一个：

>>> words[1:]
['enemy', 'of', 'my', 'enemy', 'is', 'my', 'friend']

...这样当你用它压缩原始的words 时，你会得到一个配对列表：

>>> list(zip(words, words[1:]))
[('The', 'enemy'),
 ('enemy', 'of'),
 ('of', 'my'),
 ('my', 'enemy'),
 ('enemy', 'is'),
 ('is', 'my'),
 ('my', 'friend')]

一旦你得到它，你的列表理解只需要返回每对中的第二个词，如果第一个词是你要找的词：

word = 'enemy'

[next_word for this_word, next_word in zip(words, words[1:]) if this_word == word]

结果：

['of', 'is']

【讨论】：