列表列表到元组列表答案

【问题标题】：Lists of lists into lists of tuples列表列表到元组列表
【发布时间】：2021-08-15 08:12:41
【问题描述】：

在获得大量 .txt 文件后，如何将文本文件的行分解为 3 个相互重叠的元组？如下图所示，我已经按照空格进行了分行，我认为这是正确的。

例如，如果单词列表是“the quick brown fox jumps over the lazy dog”并且“n”是 3，那么输出应该是 [('the', 'quick', '棕色'), （'快速'，'棕色'，'狐狸'），（'棕色'，'狐狸'，'跳跃'），（'狐狸'，'跳跃'，'over'），（'跳跃'，'over'，'the'）， ('over', 'the', '懒惰'), ('the', '懒惰', 'dog')]

TIA

n=3
word_list=[]   #Initialising to empty
filename = "filename.txt"
with open(filename,"r") as file_object: #
    for line in file_object:  #for loop to read every line in .txt file
        word_list=line.split()  #spliting the lines by "white space"
        new_list = [word_list[i:i+n] for i in range(0, len(word_list), n)]
        tuple(new_list)
        print(new_list)

【问题讨论】：

标签： file split tuples txt

【解决方案1】：

如果你想创建 ngram，你可以例如还可以使用以快速有效的方式实现此功能的包

from nltk import ngrams
s = 'the quick brown fox jumps over the lazy dog'
ngrams = ngrams(s.split(), 3)
for word in ngrams :
  print(word)

这会生成以下输出

('the', 'quick', 'brown')
('quick', 'brown', 'fox')
('brown', 'fox', 'jumps')
('fox', 'jumps', 'over')
('jumps', 'over', 'the')
('over', 'the', 'lazy')
('the', 'lazy', 'dog')

【讨论】：

不幸的是，我不允许导入任何包 :( 需要手动完成，但谢谢