【发布时间】:2020-04-10 01:31:41
【问题描述】:
我想在我的数据框中查找使用 tfidf 出现最多的句子,我做了一些预处理作为标记化和停用词,现在我有 2 列(文本和停用词)
text Stopword
bts jimin declared himself the worst player after his self sabotage ['bts', 'jimin', 'declared','worst', 'player', 'self', 'sabotage']
bts ultra practical suga turned their game into an economy lesson ['bts', 'ultra', 'practical', 'suga', 'turned', 'game', 'economy', 'lesson']
the mystery of bts sunflowers has finally been solved ['mystery', 'bts', 'sunflowers', 'finally', 'solved']
我想从 Stopword 列中获取带有句子的数据框,其值为 tf_idf,列是这样的单词
bts tf_idf
mystery tf_idf
suga tf_idf
jimin tf_idf
declared tf_idf
worst tf_idf
player tf_idf
safe tf_idf
sabotage tf_idf
practical tf_idf
turned tf_idf
game tf_idf
economy tf_idf
lesson tf_idf
sunflower tf_idf
finally tf_idf
solved tf_idf
也许这里有人知道代码并可以帮助我?
【问题讨论】:
标签: python csv dataframe tokenize tf-idf