这里有几种可视化逻辑回归结果的方法,其中第一个、第三个和第四个都改编自 Gelman & Hill 第 5 章的 code,Data Analysis Using Regression &多级/分层模型。下面的代码使用了arm 包中的一些函数(随书附送):
library(arm)
1) 在 Lag 1 值的范围内和 Volume 的 3 个不同值上绘制 Direction="Up" 的概率(其他 Lag 值设置为零,但如果您愿意,可以将它们设置为其他值))
# Function to jitter class category values
jitter.binary <- function(a, jitt=.05){
ifelse (a-1==0, runif(length(a), 0, jitt), runif(length(a), 1-jitt, 1))
}
# Sequence of Lag1 values for plotting
x = seq(-5,5.7,length.out=100)
# Plot jittered Direction vs. Lag 1. This shows the actual distribution of the data.
with(Smarket, plot(Lag1, jitter.binary(as.numeric(Direction)), pch=16,cex=0.7,
ylab="Pr(Up)", xlab="Lag 1"))
# Add model prediction curves. These show the probability of Direction="Up" vs. Lag 1
# for three different fixed values of Volume.
curve(expr=invlogit(cbind(1, x,0,0,0,0,1.48) %*% coef(glm.fit)),
from=-5, to=5.7, lwd=.5, add=TRUE)
curve(expr=invlogit(cbind(1, x, 0,0,0,0, 0.36) %*% coef(glm.fit)),
from=-5, to=5.7, lwd=.5, add=TRUE, col="red", lty=2)
curve(expr=invlogit(cbind(1, x, 0,0,0,0, 3.15) %*% coef(glm.fit)),
from=-5, to=5.7, lwd=.5, add=TRUE, col="blue", lty=2)
2) 绘制 Lag 1 与 Volume 的关系图,按 Direction 的值对点着色,并添加决策边界。为了概念上的简单,我已经为此做了一个新的回归,其中只有两个预测变量,因为决策边界是一个用于实际回归的五维超平面。 (对于具有许多预测变量的模型,您仍然可以通过在多维预测变量空间中进行二维切片来绘制二维的决策边界。)
# New regression model
glm.fit2 = glm(Direction ~ Lag1 + Volume , data = Smarket, family=binomial)
# Probability of Direction="Up" for this model
Smarket$Pred2 = predict(glm.fit2, type="response")
# Set Prediction to "Up" for probability > 0.5; "Down" otherwise.
Smarket$PredCat2 = cut(Smarket$Pred2, c(0,0.5,1), include.lowest=TRUE, labels=c("Down","Up"))
# Graph Lag1 vs. Volume with coloring and point-style based on value
# of Direction
with(Smarket, plot(Lag1, Volume, pch=ifelse(Direction=="Down", 3, 1),
col=ifelse(Direction=="Down", "red", "blue"), cex=0.6))
# Add the decision boundary
curve(expr= -(cbind(1, x) %*% coef(glm.fit2)[1:2])/coef(glm.fit2)[3],
from=-5,to=5.7, add=TRUE)
3) 分箱残差的平均值与“向上”的预测概率的关系图:
x = predict(glm.fit, type="response")
y = resid(glm.fit)
binnedplot(x,y, xlab="Pr(Up)")
4) 用置信区间绘制系数值。这不是特定于分类的;只是一种可视化模型系数的快速方法。
coefplot(glm.fit)