如何在 glmnet 中绘制正确的标签？ [复制]答案

【问题标题】：how to plot the correct labels in glmnet? [duplicate]如何在 glmnet 中绘制正确的标签？ [复制]
【发布时间】：2018-11-26 03:53:38
【问题描述】：

考虑这个例子

library(dplyr)
library(tibble)
library(glmnet)
library(quanteda)

dtrain <- data_frame(text = c("Chinese Beijing Chinese",
                              "Chinese Chinese Shanghai",
                              "this is china",
                              "china is here",
                              'hello china',
                              "Chinese Beijing Chinese",
                              "Chinese Chinese Shanghai",
                              "this is china",
                              "china is here",
                              'hello china',
                              "Kyoto Japan",
                              "Tokyo Japan Chinese",
                              "Kyoto Japan",
                              "Tokyo Japan Chinese",
                              "Kyoto Japan",
                              "Tokyo Japan Chinese",
                              "Kyoto Japan",
                              "Tokyo Japan Chinese",
                              'japan'),
                     class = c(1, 1, 1, 1, 1,1,1,1,1,1,1,0,0,0,0,0,0,0,0))

我使用quanteda 从此数据帧中获取document term matrix

dtm <- quanteda::dfm(dtrain$text)
> dtm
Document-feature matrix of: 19 documents, 11 features (78.5% sparse).
19 x 11 sparse Matrix of class "dfm"
        features
docs     chinese beijing shanghai this is china here hello kyoto japan tokyo
  text1        2       1        0    0  0     0    0     0     0     0     0
  text2        2       0        1    0  0     0    0     0     0     0     0
  text3        0       0        0    1  1     1    0     0     0     0     0
  text4        0       0        0    0  1     1    1     0     0     0     0
  text5        0       0        0    0  0     1    0     1     0     0     0

我可以使用glmnet 轻松拟合lasso 回归：

fit <- glmnet(dtm, y = as.factor(dtrain$class), alpha = 1, family = 'binomial')

但是，绘制fit 并没有显示dtm 矩阵的标签（我只看到三条曲线）。这里有什么问题？

【问题讨论】：

标签： r glmnet

【解决方案1】：

据我了解，该图为您提供的是与重要单词相关的系数值。在您的情况下，单词 9-11，即京都、日本和东京（我可以从 dtm 表中看到）。这个普通的情节库没有我想你说你想做的。相反，您可以使用library(plotmo)，如下所示：

library(dplyr)
library(tibble)
library(glmnet)
library(quanteda)
library(plotmo)
dtrain <- data_frame(text = c("Chinese Beijing Chinese",
                              "Chinese Chinese Shanghai",
                              "this is china",
                              "china is here",
                              'hello china',
                              "Chinese Beijing Chinese",
                              "Chinese Chinese Shanghai",
                              "this is china",
                              "china is here",
                              'hello china',
                              "Kyoto Japan",
                              "Tokyo Japan Chinese",
                              "Kyoto Japan",
                              "Tokyo Japan Chinese",
                              "Kyoto Japan",
                              "Tokyo Japan Chinese",
                              "Kyoto Japan",
                              "Tokyo Japan Chinese",
                              'japan'),
                     class = c(1, 1, 1, 1, 1,1,1,1,1,1,1,0,0,0,0,0,0,0,0))


dtm <- quanteda::dfm(dtrain$text)
fit <- glmnet(dtm, y = as.factor(dtrain$class), alpha = 1, family = 'binomial')
plot_glmnet(fit, label=3)            # label the 3 biggest final coefs

干杯！

【讨论】：