【发布时间】:2021-06-01 05:53:03
【问题描述】:
我有 1,16,098 个句子和 30,119 个特征的 scipy 稀疏矩阵。 我想计算他们每个人的欧几里得距离并打印 5 个最相似的句子。
我正在使用 CountVectorizer 方法来构建词汇表并对单词进行编码。
但我遇到了错误。请帮忙。我刚刚开始使用python实现NLP。
vectorizer = CountVectorizer(stop_words = 'english')
features = vectorizer.fit_transform(corpus)
print(vectorizer.vocabulary_)
features.shape
(116098, 30119)
print(len(vectorizer.vocabulary_))
30119
for i in range(0,116098):
for j in features:
print(euclidean_distances(features[j],i))
IndexError Traceback (most recent call last)
<ipython-input-56-528966153c16> in <module>
1 for i in range(0,116098):
2 for j in features:
----> 3 print(euclidean_distances(features[j],i))```
~\Anaconda3\lib\site-packages\scipy\sparse\_index.py in __getitem__(self, key)
33 """
34 def __getitem__(self, key):
---> 35 row, col = self._validate_indices(key)
36 # Dispatch to specialized methods.
37 if isinstance(row, INT_TYPES):
~\Anaconda3\lib\site-packages\scipy\sparse\_index.py in _validate_indices(self, key)
128 def _validate_indices(self, key):
129 M, N = self.shape
--> 130 row, col = _unpack_index(key)
131
132 if isintlike(row):
~\Anaconda3\lib\site-packages\scipy\sparse\_index.py in _unpack_index(index)
274 # not work because spmatrix.ndim is always 2.
275 raise IndexError(
--> 276 'Indexing with sparse matrices is not supported '
277 'except boolean indexing where matrix and index '
278 'are equal shapes.')
IndexError: Indexing with sparse matrices is not supported except boolean indexing where matrix and index are equal shapes.
【问题讨论】:
标签: python numpy machine-learning scipy nlp