【发布时间】:2020-12-01 15:51:57
【问题描述】:
library(dplyr)
library(ggplot2)
library(stm)
library(janeaustenr)
library(tidytext)
library(quanteda)
testDfm <- gadarian$open.ended.response %>%
tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) %>%
dfm()
out <- convert(testDfm, to = "stm")
documents <- out$documents
vocab <- out$vocab
meta <- out$meta
topic_model<- stm(documents = out$documents, vocab = out$vocab, K = 5)
使用这些行可以实现主题建模方法
如何使用 tidytext 来接收每一行输入数据 gadarian 查看每一行链接到哪个主题,将主题添加到输入数据?
预期输出示例
"MetaID" "treatment" "pid_rep" "open.ended.response" "topic_number"
更新代码作为预期输出的示例:
library(stm)
library(tidyr)
library(quanteda)
testDfm <- gadarian$open.ended.response %>%
tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) %>%
dfm()
out <- convert(testDfm, to = "stm")
documents <- out$documents
vocab <- out$vocab
meta <- out$meta
fittedModel <- stm(documents = out$documents, vocab = out$vocab, K = 5)
documentMatches <- findThoughts(fittedModel, texts = gadarian$open.ended.response, n = 1)
docTopics <- sapply(1:nrow(gadarian), function(docIndex) { names(documentMatches$index[documentMatches$index == docIndex][1]) })
gadarian$topic <- docTopics
【问题讨论】:
-
我认为没有足够的解释来解释你想要做什么
-
那么您要编辑 testDfm 数据框吗?
-
你的输入数据集是哪一个?根据您的代码 testDfm 是单个数据框,但其余的是列表
-
未找到fittedModel
-
顺便说一句,所以你想在 gadarian$open.ended.response 中添加一列?