【发布时间】:2021-10-01 16:57:17
【问题描述】:
我有这个示例数据集:
data.1 <-read.csv(text = "
country,year,response
Austria,2010,34378
Austria,2011,38123
Austria,2012,37126
Austria,2013,42027
Austria,2014,43832
Austria,2015,56895
Austria,2016,49791
Austria,2017,64467
Austria,2018,67620
Austria,2019,69210
Croatia,2010,56456
Croatia,2011,58896
Croatia,2012,54109
Croatia,2013,47156
Croatia,2014,47104
Croatia,2015,88867
Croatia,2016,78614
Croatia,2017,85133
Croatia,2018,77090
Croatia,2019,78330
France,2010,50939
France,2011,41571
France,2012,37367
France,2013,42999
France,2014,75789
France,2015,122529
France,2016,136518
France,2017,141829
France,2018,153850
France,2019,163800
")
我想通过country 调整loess 函数,并在我提供的数据框中获得每年的预测值。 loess 平滑看起来像这样:
ggplot(data.1, aes(x=year, y=response, color=country)) +
geom_point(size = 3, alpha=0.3) +
#geom_line(aes(x=year, y=area_harvested_ha/1000), size=0.5, alpha= 1) +
geom_smooth(method = 'loess', span=0.75, na.rm = T, se=F, size = 2)
剧情:
这是我试图得到预测的代码:
data.1.with.pred <- data.1 %>%
group_by(country) %>%
arrange(country, year) %>%
mutate(pred.response = stats::predict(stats::loess(response ~ year, span = .75, data=.),
data.frame(year = seq(min(year), max(year), 1))))
我在数据框中得到了预测,但country 的分组不起作用。
剧情如下:
ggplot(data.1.with.pred, aes(x=year, y=pred.response, color=country)) +
geom_point(aes(x=year, y=response), size = 3, alpha=0.3) +
#geom_line(aes(x=year, y=area_harvested_ha/1000), size=0.5, alpha= 1) +
geom_smooth(method = 'loess', span=0.75, na.rm = T, se=F, size = 2)
我遇到的问题是country 的分组失败。我从这里得到了这个答案:
https://stackoverflow.com/a/53400029/4880334
非常感谢您的建议。
【问题讨论】:
-
哪个部分不能正常工作?如果您想要图表中的多条黄土曲线,问题实际上在于
geom_smooth如何分组(或不分组)您的数据。它不知道您的数据框已分组——一旦您进入 ggplot 地形,它基本上就会消失。尝试在aes中添加group = country -
如果问题仅在于为每个组独立构建模型,请参阅 tidyverse 人员提出的many models 方法,以及诸如stackoverflow.com/q/22713325/5325862、stackoverflow.com/q/58050279/5325862、stackoverflow.com/q/38879849/5325862 之类的问题