你并不真正想要 skipgram 本身,但你想要一个按大小划分的块,试试这个:
from lazyme import per_chunk
tokens = "my name is John".split()
list(per_chunk(tokens, 2))
[出]:
[('my', 'name'), ('is', 'John')]
如果你想要滚动窗口,即ngrams:
from lazyme import per_window
tokens = "my name is John".split()
list(per_window(tokens, 2))
[出]:
[('my', 'name'), ('name', 'is'), ('is', 'John')]
在 NLTK 中同样适用于 ngram:
from nltk import ngrams
tokens = "my name is John".split()
list(ngrams(tokens, 2))
[出]:
[('my', 'name'), ('name', 'is'), ('is', 'John')]
如果你想要实际的skipgrams,How to compute skipgrams in python?
from nltk import skipgrams
tokens = "my name is John".split()
list(skipgrams(tokens, n=2, k=1))
[出]:
[('my', 'name'),
('my', 'is'),
('name', 'is'),
('name', 'John'),
('is', 'John')]