您可以为 summary.estimateEffect 打印超过 11 个协变量吗？答案

【问题标题】：Can you print more than 11 covariates for summary.estimateEffect?您可以为 summary.estimateEffect 打印超过 11 个协变量吗？
【发布时间】：2021-05-26 21:13:42
【问题描述】：

我创建了一个 stm 主题模型，但 summary.estimateEffect 有问题，我有大约 150 天，但它只打印 10 天的回归估计。

parlPrevFit<- stm(document = out$documents, vocab = out$vocab, K = 0, prevalence =~s(day),
                    max.em.its = 150, data = out$meta, init.type = "Spectral")

prep<- estimateEffect(c(14, 40, 5, 41)~s(day), parlPrevFit, meta = meta, uncertainty = "Global")

summary(prep, topics = c(14, 40, 5, 41))

主题 14 系数-https://prnt.sc/105pg1a

谁能推荐任何关于如何打印超过 10 天的建议？

【问题讨论】：

标签： r topic-modeling topicmodels

【解决方案1】：

不要使用您无法控制的summary()，而是加载tidytext 包并改用tidy()。

让我们看一个例子，我们在简奥斯汀的小说上训练一个主题模型，文档是每个章节：

library(tidyverse)
library(tidytext)
library(stm)
#> stm v1.3.6 successfully loaded. See ?stm for help. 
#>  Papers, resources, and other materials at structuraltopicmodel.com
library(janeaustenr)

books <- austen_books() %>%
  group_by(book) %>%
  mutate(chapter = cumsum(str_detect(text, regex("^chapter ", ignore_case = TRUE)))) %>%
  ungroup() %>%
  filter(chapter > 0) %>%
  unite(document, book, chapter, remove = FALSE)

austen_sparse <- books %>%
  unnest_tokens(word, text) %>%
  anti_join(stop_words) %>%
  count(document, word) %>%
  cast_sparse(document, word, n)
#> Joining, by = "word"

让我们训练一个有 6 个主题的主题模型（有 6 本书）：

topic_model <- stm(
  austen_sparse, 
  K = 6,
  init.type = "Spectral",
  verbose = FALSE
)

让我们制作一个数据集以在estimateEffect()中使用：

chapters <- books %>%
  group_by(document) %>% 
  summarize(text = str_c(text, collapse = " ")) %>%
  ungroup() %>%
  inner_join(books %>%
               distinct(document, book))
#> Joining, by = "document"

chapters
#> # A tibble: 269 x 3
#>    document text                                                           book 
#>    <chr>    <chr>                                                          <fct>
#>  1 Emma_1   "CHAPTER I   Emma Woodhouse, handsome, clever, and rich, with… Emma 
#>  2 Emma_10  "CHAPTER X   Though now the middle of December, there had yet… Emma 
#>  3 Emma_11  "CHAPTER XI   Mr. Elton must now be left to himself. It was n… Emma 
#>  4 Emma_12  "CHAPTER XII   Mr. Knightley was to dine with them--rather ag… Emma 
#>  5 Emma_13  "CHAPTER XIII   There could hardly be a happier creature in t… Emma 
#>  6 Emma_14  "CHAPTER XIV   Some change of countenance was necessary for e… Emma 
#>  7 Emma_15  "CHAPTER XV   Mr. Woodhouse was soon ready for his tea; and w… Emma 
#>  8 Emma_16  "CHAPTER XVI   The hair was curled, and the maid sent away, a… Emma 
#>  9 Emma_17  "CHAPTER XVII   Mr. and Mrs. John Knightley were not detained… Emma 
#> 10 Emma_18  "CHAPTER XVIII   Mr. Frank Churchill did not come. When the t… Emma 
#> # … with 259 more rows

现在让我们从我们的主题模型中估计回归，对于我们的前三个主题和我们的“章节”文档数据集：

effects <- estimateEffect(1:3 ~ book, topic_model, chapters)

summary(effects)
#> 
#> Call:
#> estimateEffect(formula = 1:3 ~ book, stmobj = topic_model, metadata = chapters)
#> 
#> 
#> Topic 1:
#> 
#> Coefficients:
#>                        Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)            0.018033   0.023726   0.760    0.448    
#> bookPride & Prejudice  0.799555   0.037140  21.528   <2e-16 ***
#> bookMansfield Park    -0.006387   0.032662  -0.196    0.845    
#> bookEmma               0.003188   0.033393   0.095    0.924    
#> bookNorthanger Abbey   0.002535   0.039017   0.065    0.948    
#> bookPersuasion         0.025725   0.044281   0.581    0.562    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> Topic 2:
#> 
#> Coefficients:
#>                        Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)            0.015289   0.016478   0.928    0.354    
#> bookPride & Prejudice  0.001785   0.023489   0.076    0.939    
#> bookMansfield Park     0.001616   0.024664   0.066    0.948    
#> bookEmma               0.892516   0.037833  23.591   <2e-16 ***
#> bookNorthanger Abbey   0.006032   0.031530   0.191    0.848    
#> bookPersuasion        -0.001142   0.030052  -0.038    0.970    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> 
#> Topic 3:
#> 
#> Coefficients:
#>                         Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)            0.0196151  0.0225115   0.871   0.3844    
#> bookPride & Prejudice -0.0004909  0.0286302  -0.017   0.9863    
#> bookMansfield Park     0.0148960  0.0341272   0.436   0.6628    
#> bookEmma              -0.0004006  0.0301741  -0.013   0.9894    
#> bookNorthanger Abbey   0.8730570  0.0457994  19.063   <2e-16 ***
#> bookPersuasion         0.1030537  0.0495148   2.081   0.0384 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

这个例子没有你提到的打印限制的问题，但是你可以通过使用tidy() 来避免任何类似的问题，而不是你得到回归的实际内容：

tidy(effects)
#> # A tibble: 18 x 6
#>    topic term                   estimate std.error statistic  p.value
#>    <int> <chr>                     <dbl>     <dbl>     <dbl>    <dbl>
#>  1     1 (Intercept)            0.0179      0.0238    0.753  4.52e- 1
#>  2     1 bookPride & Prejudice  0.799       0.0373   21.4    1.09e-59
#>  3     1 bookMansfield Park    -0.00614     0.0325   -0.189  8.50e- 1
#>  4     1 bookEmma               0.00350     0.0336    0.104  9.17e- 1
#>  5     1 bookNorthanger Abbey   0.00323     0.0394    0.0820 9.35e- 1
#>  6     1 bookPersuasion         0.0253      0.0443    0.571  5.68e- 1
#>  7     2 (Intercept)            0.0153      0.0165    0.925  3.56e- 1
#>  8     2 bookPride & Prejudice  0.00165     0.0234    0.0707 9.44e- 1
#>  9     2 bookMansfield Park     0.00167     0.0246    0.0680 9.46e- 1
#> 10     2 bookEmma               0.892       0.0381   23.4    2.84e-66
#> 11     2 bookNorthanger Abbey   0.00606     0.0317    0.191  8.49e- 1
#> 12     2 bookPersuasion        -0.00107     0.0298   -0.0359 9.71e- 1
#> 13     3 (Intercept)            0.0197      0.0228    0.864  3.89e- 1
#> 14     3 bookPride & Prejudice -0.000835    0.0288   -0.0290 9.77e- 1
#> 15     3 bookMansfield Park     0.0147      0.0342    0.428  6.69e- 1
#> 16     3 bookEmma              -0.000707    0.0305   -0.0232 9.82e- 1
#> 17     3 bookNorthanger Abbey   0.873       0.0461   18.9    4.93e-51
#> 18     3 bookPersuasion         0.103       0.0496    2.08   3.85e- 2

^{由reprex package (v1.0.0) 于 2021 年 2 月 26 日创建}

【讨论】：

天哪，这是纯金。老实说，我尝试以与您的 Game Is Afoot 相同的方式运行它！但我不知道如何不仅将日子作为协变量，而且与政治家的政治派别——莫莉·E·罗伯茨 (Molly E. Roberts) 所做的那样。非常感谢，现在我终于可以完成我的论文了。