据我所知,对于仅具有分类预测变量的线性回归模型,不可能有直线拟合。您可以绘制每个点。这里我会使用iris 数据集。
library(tidyverse)
as_tibble(iris)
#> # A tibble: 150 x 5
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ... with 140 more rows
考虑回归问题Petal.width ~ Species。
iris %>%
ggplot() +
aes(x = Species, y = Petal.Width, colour = Species) +
geom_boxplot(show.legend = FALSE)
从这个箱线图中,可以看到Petal.width在每个Species中的分布情况和正相关关系。对于定性预测器,变量将被编码为:
contrasts(iris$Species)
#> versicolor virginica
#> setosa 0 0
#> versicolor 1 0
#> virginica 0 1
这样模型就变成了
在哪里
和
因此,每个拟合值将变为
根据这些估计
lm(Petal.Width ~ Species, data = iris)
#>
#> Call:
#> lm(formula = Petal.Width ~ Species, data = iris)
#>
#> Coefficients:
#> (Intercept) Speciesversicolor Speciesvirginica
#> 0.246 1.080 1.780
如上所述,有了这些事实,每个拟合值都可以绘制在图上。
来自lm():
iris %>%
select(Species, Petal.Width) %>% # just for clarity
mutate(pred = lm(Petal.Width ~ Species)$fitted.values) %>% # linear regression
ggplot() +
aes(x = Species, y = Petal.Width) +
geom_point() +
geom_point(aes(x = Species, y = pred), col = "red", size = 3) # fitted values
另外,注意每个拟合值都是样本均值,
iris %>%
select(Species, Petal.Width) %>%
group_by(Species) %>% # for each category
mutate(pred = mean(Petal.Width)) %>% # sample mean of response in each category
ggplot() +
aes(x = Species, y = Petal.Width) +
geom_point() +
geom_point(aes(x = Species, y = pred), col = "red", size = 3)