【问题标题】:Can you print more than 11 covariates for summary.estimateEffect?您可以为 summary.estimateEffect 打印超过 11 个协变量吗?
【发布时间】:2021-05-26 21:13:42
【问题描述】:

我创建了一个 stm 主题模型,但 summary.estimateEffect 有问题,我有大约 150 天,但它只打印 10 天的回归估计。

parlPrevFit<- stm(document = out$documents, vocab = out$vocab, K = 0, prevalence =~s(day),
                    max.em.its = 150, data = out$meta, init.type = "Spectral")

prep<- estimateEffect(c(14, 40, 5, 41)~s(day), parlPrevFit, meta = meta, uncertainty = "Global")

summary(prep, topics = c(14, 40, 5, 41))

主题 14 系数-https://prnt.sc/105pg1a

谁能推荐任何关于如何打印超过 10 天的建议?

【问题讨论】:

    标签: r topic-modeling topicmodels


    【解决方案1】:

    不要使用您无法控制的summary(),而是加载 包并改用tidy()

    让我们看一个例子,我们在简奥斯汀的小说上训练一个主题模型,文档是每个章节

    library(tidyverse)
    library(tidytext)
    library(stm)
    #> stm v1.3.6 successfully loaded. See ?stm for help. 
    #>  Papers, resources, and other materials at structuraltopicmodel.com
    library(janeaustenr)
    
    books <- austen_books() %>%
      group_by(book) %>%
      mutate(chapter = cumsum(str_detect(text, regex("^chapter ", ignore_case = TRUE)))) %>%
      ungroup() %>%
      filter(chapter > 0) %>%
      unite(document, book, chapter, remove = FALSE)
    
    austen_sparse <- books %>%
      unnest_tokens(word, text) %>%
      anti_join(stop_words) %>%
      count(document, word) %>%
      cast_sparse(document, word, n)
    #> Joining, by = "word"
    

    让我们训练一个有 6 个主题的主题模型(有 6 本书):

    topic_model <- stm(
      austen_sparse, 
      K = 6,
      init.type = "Spectral",
      verbose = FALSE
    )
    

    让我们制作一个数据集以在estimateEffect()中使用:

    chapters <- books %>%
      group_by(document) %>% 
      summarize(text = str_c(text, collapse = " ")) %>%
      ungroup() %>%
      inner_join(books %>%
                   distinct(document, book))
    #> Joining, by = "document"
    
    chapters
    #> # A tibble: 269 x 3
    #>    document text                                                           book 
    #>    <chr>    <chr>                                                          <fct>
    #>  1 Emma_1   "CHAPTER I   Emma Woodhouse, handsome, clever, and rich, with… Emma 
    #>  2 Emma_10  "CHAPTER X   Though now the middle of December, there had yet… Emma 
    #>  3 Emma_11  "CHAPTER XI   Mr. Elton must now be left to himself. It was n… Emma 
    #>  4 Emma_12  "CHAPTER XII   Mr. Knightley was to dine with them--rather ag… Emma 
    #>  5 Emma_13  "CHAPTER XIII   There could hardly be a happier creature in t… Emma 
    #>  6 Emma_14  "CHAPTER XIV   Some change of countenance was necessary for e… Emma 
    #>  7 Emma_15  "CHAPTER XV   Mr. Woodhouse was soon ready for his tea; and w… Emma 
    #>  8 Emma_16  "CHAPTER XVI   The hair was curled, and the maid sent away, a… Emma 
    #>  9 Emma_17  "CHAPTER XVII   Mr. and Mrs. John Knightley were not detained… Emma 
    #> 10 Emma_18  "CHAPTER XVIII   Mr. Frank Churchill did not come. When the t… Emma 
    #> # … with 259 more rows
    

    现在让我们从我们的主题模型中估计回归,对于我们的前三个主题和我们的“章节”文档数据集:

    effects <- estimateEffect(1:3 ~ book, topic_model, chapters)
    
    summary(effects)
    #> 
    #> Call:
    #> estimateEffect(formula = 1:3 ~ book, stmobj = topic_model, metadata = chapters)
    #> 
    #> 
    #> Topic 1:
    #> 
    #> Coefficients:
    #>                        Estimate Std. Error t value Pr(>|t|)    
    #> (Intercept)            0.018033   0.023726   0.760    0.448    
    #> bookPride & Prejudice  0.799555   0.037140  21.528   <2e-16 ***
    #> bookMansfield Park    -0.006387   0.032662  -0.196    0.845    
    #> bookEmma               0.003188   0.033393   0.095    0.924    
    #> bookNorthanger Abbey   0.002535   0.039017   0.065    0.948    
    #> bookPersuasion         0.025725   0.044281   0.581    0.562    
    #> ---
    #> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    #> 
    #> 
    #> Topic 2:
    #> 
    #> Coefficients:
    #>                        Estimate Std. Error t value Pr(>|t|)    
    #> (Intercept)            0.015289   0.016478   0.928    0.354    
    #> bookPride & Prejudice  0.001785   0.023489   0.076    0.939    
    #> bookMansfield Park     0.001616   0.024664   0.066    0.948    
    #> bookEmma               0.892516   0.037833  23.591   <2e-16 ***
    #> bookNorthanger Abbey   0.006032   0.031530   0.191    0.848    
    #> bookPersuasion        -0.001142   0.030052  -0.038    0.970    
    #> ---
    #> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    #> 
    #> 
    #> Topic 3:
    #> 
    #> Coefficients:
    #>                         Estimate Std. Error t value Pr(>|t|)    
    #> (Intercept)            0.0196151  0.0225115   0.871   0.3844    
    #> bookPride & Prejudice -0.0004909  0.0286302  -0.017   0.9863    
    #> bookMansfield Park     0.0148960  0.0341272   0.436   0.6628    
    #> bookEmma              -0.0004006  0.0301741  -0.013   0.9894    
    #> bookNorthanger Abbey   0.8730570  0.0457994  19.063   <2e-16 ***
    #> bookPersuasion         0.1030537  0.0495148   2.081   0.0384 *  
    #> ---
    #> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    

    这个例子没有你提到的打印限制的问题,但是你可以通过使用tidy() 来避免任何类似的问题,而不是你得到回归的实际内容:

    tidy(effects)
    #> # A tibble: 18 x 6
    #>    topic term                   estimate std.error statistic  p.value
    #>    <int> <chr>                     <dbl>     <dbl>     <dbl>    <dbl>
    #>  1     1 (Intercept)            0.0179      0.0238    0.753  4.52e- 1
    #>  2     1 bookPride & Prejudice  0.799       0.0373   21.4    1.09e-59
    #>  3     1 bookMansfield Park    -0.00614     0.0325   -0.189  8.50e- 1
    #>  4     1 bookEmma               0.00350     0.0336    0.104  9.17e- 1
    #>  5     1 bookNorthanger Abbey   0.00323     0.0394    0.0820 9.35e- 1
    #>  6     1 bookPersuasion         0.0253      0.0443    0.571  5.68e- 1
    #>  7     2 (Intercept)            0.0153      0.0165    0.925  3.56e- 1
    #>  8     2 bookPride & Prejudice  0.00165     0.0234    0.0707 9.44e- 1
    #>  9     2 bookMansfield Park     0.00167     0.0246    0.0680 9.46e- 1
    #> 10     2 bookEmma               0.892       0.0381   23.4    2.84e-66
    #> 11     2 bookNorthanger Abbey   0.00606     0.0317    0.191  8.49e- 1
    #> 12     2 bookPersuasion        -0.00107     0.0298   -0.0359 9.71e- 1
    #> 13     3 (Intercept)            0.0197      0.0228    0.864  3.89e- 1
    #> 14     3 bookPride & Prejudice -0.000835    0.0288   -0.0290 9.77e- 1
    #> 15     3 bookMansfield Park     0.0147      0.0342    0.428  6.69e- 1
    #> 16     3 bookEmma              -0.000707    0.0305   -0.0232 9.82e- 1
    #> 17     3 bookNorthanger Abbey   0.873       0.0461   18.9    4.93e-51
    #> 18     3 bookPersuasion         0.103       0.0496    2.08   3.85e- 2
    

    reprex package (v1.0.0) 于 2021 年 2 月 26 日创建

    【讨论】:

    • 天哪,这是纯金。老实说,我尝试以与您的 Game Is Afoot 相同的方式运行它!但我不知道如何不仅将日子作为协变量,而且与政治家的政治派别——莫莉·E·罗伯茨 (Molly E. Roberts) 所做的那样。非常感谢,现在我终于可以完成我的论文了。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-10-16
    • 2017-05-13
    • 1970-01-01
    相关资源
    最近更新 更多