【问题标题】:Add detected topics to input data将检测到的主题添加到输入数据
【发布时间】:2020-12-01 15:51:57
【问题描述】:
library(dplyr)
library(ggplot2)
library(stm)
library(janeaustenr)
library(tidytext)

library(quanteda)
testDfm <- gadarian$open.ended.response %>%
    tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE)  %>%
    dfm()
    
out <- convert(testDfm, to = "stm")
documents <- out$documents
vocab <- out$vocab
meta <- out$meta

topic_model<- stm(documents = out$documents, vocab = out$vocab, K = 5)

使用这些行可以实现主题建模方法

如何使用 tidytext 来接收每一行输入数据 gadarian 查看每一行链接到哪个主题,将主题添加到输入数据?

预期输出示例

"MetaID" "treatment" "pid_rep"  "open.ended.response" "topic_number"

更新代码作为预期输出的示例:

library(stm)
library(tidyr)
library(quanteda)
testDfm <- gadarian$open.ended.response %>%
    tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE)  %>%
    dfm()
    
out <- convert(testDfm, to = "stm")
documents <- out$documents
vocab <- out$vocab
meta <- out$meta

fittedModel <- stm(documents = out$documents, vocab = out$vocab, K = 5)

documentMatches <- findThoughts(fittedModel, texts = gadarian$open.ended.response, n = 1)
docTopics <- sapply(1:nrow(gadarian), function(docIndex) { names(documentMatches$index[documentMatches$index == docIndex][1]) })
gadarian$topic <- docTopics

【问题讨论】:

  • 我认为没有足够的解释来解释你想要做什么
  • 那么您要编辑 testDfm 数据框吗?
  • 你的输入数据集是哪一个?根据您的代码 testDfm 是单个数据框,但其余的是列表
  • 未找到fittedModel
  • 顺便说一句,所以你想在 gadarian$open.ended.response 中添加一列?

标签: r quanteda tidytext


【解决方案1】:
install.packages("reshape2")
library(reshape2)
td_beta <- tidy(fittedModel)
td_beta
td_beta %>%
  group_by(topic) %>%
  top_n(10, beta) %>%
  ungroup() %>%
  ggplot(aes(term, beta)) +
  geom_col() +
  facet_wrap(~ topic, scales = "free") +
  coord_flip()
td_gamma <- tidy(fittedModel, matrix = "gamma",
                 document_names = rownames(gadarian))
td_gamma

【讨论】:

猜你喜欢
  • 2016-03-04
  • 1970-01-01
  • 2022-07-19
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2022-06-17
相关资源
最近更新 更多