【发布时间】:2020-06-12 11:22:16
【问题描述】:
我正在尝试在 df 的列上使用 gensim 短语。下面给出了示例df
col1 col2
1 "this is test1 and is used for test1"
2 "this is content of row which is second row"
3 "this is the third row"
我写了一个二元组的方法
def bigrams(text):
bigram = Phrases(text, min_count=1)
bigram_mod = Phraser(bigram)
return [bigram_mod[doc] for doc in text]
我试过了
df['col2'].apply(bigrams)
df['col2'].apply(lambda x: bigrams([x])) - so that the text is enclosed in list
但我将字符作为输出而不是二元组。我在这里错过了什么。
【问题讨论】:
标签: python pandas gensim n-gram phrase