【发布时间】:2019-04-10 02:26:27
【问题描述】:
我已经获取了一列数据集,其中每一行都有文本形式的描述。我正在尝试查找 tf-idf 大于某个值 n 的单词。但是代码给出了一个分数矩阵我如何对分数进行排序和过滤并查看相应的单词。
tempdataFrame = wineData.loc[wineData.variety == 'Shiraz',
'description'].reset_index()
tempdataFrame['description'] = tempdataFrame['description'].apply(lambda
x: str.lower(x))
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer(analyzer='word', stop_words = 'english')
score = tfidf.fit_transform(tempDataFrame['description'])
Sample Data:
description
This tremendous 100% varietal wine hails from Oakville and was aged over
three years in oak. Juicy red-cherry fruit and a compelling hint of caramel
greet the palate, framed by elegant, fine tannins and a subtle minty tone in
the background. Balanced and rewarding from start to finish, it has years
ahead of it to develop further nuance. Enjoy 2022–2030.
【问题讨论】:
-
是否可以添加一些示例数据?