【问题标题】:R topicmodels tidytext - Latent Dirchelet Allocation (LDA) : Error: binding not found: 'Var1'R topicmodels tidytext - 潜在 Dirchelet 分配(LDA):错误:找不到绑定:'Var1'
【发布时间】:2017-09-17 17:19:55
【问题描述】:

我在 R 中的 LDA 模型出现问题。每次我尝试在我的 LDA_VEM 对象上执行 tidy() 函数时,我都会收到错误“错误:找不到绑定:'Var1'。请您解释一下如何解决这个问题我的代码如下:

why <-read.csv("FakeDoc.csv", header = FALSE, na.strings = "")
why.char <- data_frame(text=as.character(why$V1))
why.char <- why.char %>%
  mutate(document = row_number())
why.tidy <- why.char %>%
  unnest_tokens(word, text)
why.tidy <- why.tidy %>%
  anti_join(stop_words)
why.tidy <- why.tidy %>%
  filter(!str_detect(word,"[0-9]"))

  #Frequency Table
why.doc <- why.tidy %>%
  count(document, word, sort = TRUE) %>%
  ungroup()
why.words <- why.doc %>%
  group_by(document) %>%
  summarize(total = sum(n))
why.ft <- left_join(why.doc, why.words)
grams1_united <- why.ft[c("document", "word", "total")] 

  #N-grams
tidy.n2 <- why.char %>%
  unnest_tokens(ngram, text, token = "ngrams", n=2)
tidy.n3 <- why.char %>%
  unnest_tokens(ngram, text, token = "ngrams", n=3)

tidy.n2 <- tidy.n2 %>%
  filter(!str_detect(ngram, "[0-9]"))
tidy.n3 <- tidy.n3 %>%
  filter(!str_detect(ngram, "[0-9]"))

tidy.n2 %>%
  count(ngram, sort = TRUE)
tidy.n3 %>%
  count(ngram, sort = TRUE)

grams2_seperated <- tidy.n2 %>%
  separate(ngram, c("word1", "word2"), sep = " ")
grams2_filtered <- grams2_seperated %>%
  filter(!word1 %in% stop_words$word) %>%
  filter(!word2 %in% stop_words$word)
gram2_counts <- grams2_filtered %>%
  count(word1, word2, sort = TRUE)
grams2_united <- grams2_filtered %>%
  unite(ngram, word1, word2, sep = " ")
grams2_united <- grams2_united %>%
  group_by(document) %>%
  count(ngram, sort = TRUE)
grams2_united

grams3_seperated <- tidy.n3 %>%
  separate(ngram, c("word1", "word2", "word3"), sep = " ")
grams3_filtered <- grams3_seperated %>%
  filter(!word1 %in% stop_words$word) %>%
  filter(!word2 %in% stop_words$word) %>%
  filter(!word3 %in% stop_words$word)
gram3_counts <- grams3_filtered %>%
  count(word1, word2, word3, sort = TRUE)
grams3_united <- grams3_filtered %>%
  unite(ngram, word1, word2, word3, sep = " ")
grams3_united <- grams3_united %>%
  group_by(document) %>%
  count(ngram, sort = TRUE)

colnames(grams2_united) <- c("document", "word", "total")
colnames(grams3_united) <- c("document", "word", "total")

  #DTM
grams1_united
grams2_united
grams3_united
detractorwhy.tots <- rbind.data.frame(grams1_united, grams2_united, grams3_united)
dtwtots <- as.data.frame(detractorwhy.tots)
dtw.dtm <- dtwtots %>%
  cast_dtm(document, word, total)
dtw_5lda <- LDA(dtw.dtm,control = list(alpha = 0.05), k = 5)
topics <- tidy(dtw_5lda)

【问题讨论】:

  • 该特定错误(“错误:未找到绑定...”)意味着tidy() 函数所期望的列不存在。我不能说我以前见过,也不能从这么多的信息中确切地知道发生了什么。
  • 你能发一个reproducible example吗?我知道在这种情况下这很困难,因为主题建模的数据往往有些复杂,但这将帮助我们深入了解您为什么会看到这种情况。如果你愿意,你可以使用data we have for the topic modeling vignette,看看你是否也看到了使用该数据的错误。或者比较两个文档术语矩阵,或者 LDA 输出等等等等。

标签: r text-mining lda topic-modeling topicmodels


【解决方案1】:

在 LDA 对象上运行 tidy 时出现同样的错误。最终发现我正在加载的库之间存在某种冲突:topicmodels vs reshape2。停止导入 reshape2 库后,错误消失了。

【讨论】:

    猜你喜欢
    • 2015-10-22
    • 2018-05-20
    • 1970-01-01
    • 2017-01-06
    • 2011-09-08
    • 1970-01-01
    • 1970-01-01
    • 2014-09-24
    • 1970-01-01
    相关资源
    最近更新 更多