在我的 python 数据框中使用 tfidf 出现最多的句子答案

【问题标题】：sentence that appear the most using tfidf in my dataframe with python在我的 python 数据框中使用 tfidf 出现最多的句子
【发布时间】：2020-04-10 01:31:41
【问题描述】：

我想在我的数据框中查找使用 tfidf 出现最多的句子，我做了一些预处理作为标记化和停用词，现在我有 2 列（文本和停用词）

text                                                                   Stopword
bts jimin declared himself the worst player after his self sabotage    ['bts', 'jimin', 'declared','worst', 'player', 'self', 'sabotage']
bts ultra practical suga turned their game into an economy lesson      ['bts', 'ultra', 'practical', 'suga', 'turned', 'game', 'economy', 'lesson']
the mystery of bts sunflowers has finally been solved                  ['mystery', 'bts', 'sunflowers', 'finally', 'solved']

我想从 Stopword 列中获取带有句子的数据框，其值为 tf_idf，列是这样的单词

bts           tf_idf
mystery       tf_idf
suga          tf_idf
jimin         tf_idf
declared      tf_idf
worst         tf_idf
player        tf_idf
safe          tf_idf
sabotage      tf_idf
practical     tf_idf
turned        tf_idf
game          tf_idf
economy       tf_idf
lesson        tf_idf
sunflower     tf_idf
finally       tf_idf
solved        tf_idf

也许这里有人知道代码并可以帮助我？

【问题讨论】：

标签： python csv dataframe tokenize tf-idf

【解决方案1】：

所以看起来tf-idf 有很多方程式。我不确定要使用哪一个，但一旦你决定我会做这样的事情：

def tf_idf(word):
  # do stuff
  return stuff

output = []
for index, row in df.iterrows():
  for word in row:
    output.append([word, tf_idf(word)])

output = pd.DataFrame(data=output, columns=["Word", "TF_IDF"])

【讨论】：