【发布时间】:2018-03-06 01:48:03
【问题描述】:
我需要从两个分类变量列中创建一个频率表,其中一个是 5 岁年龄组,另一个是来自 brfss2013 数据集的健康状况(五个州),我从中提取了感兴趣的列:
> hlthgrpq1 <- brfss2013 %>% select(genhlth, X_ageg5yr)
从而生成一个两列框架,对 2 个变量进行 491775 次观察。
genhlth X_ageg5yr
1 Fair Age 60 to 64
2 Good Age 50 to 54
3 Good Age 55 to 59
4 Very good Age 60 to 64
5 Good Age 65 to 69
我可以用'by'函数生成一个汇总表:
> by(hlthgrpq1$genhlth, hlthgrpq1$X_ageg5yr, summary)
hlthgrpq1$X_ageg5yr: Age 18 to 24
Excellent Very good Good Fair Poor NA's
6896 10266 7795 1873 303 69
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 25 to 29
Excellent Very good Good Fair Poor NA's
5779 8488 6521 1751 325 46
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 30 to 34
Excellent Very good Good Fair Poor NA's
6412 9958 7977 2295 496 75
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 35 to 39
Excellent Very good Good Fair Poor NA's
6366 10169 8236 2637 638 61
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 40 to 44
Excellent Very good Good Fair Poor NA's
6689 11130 9193 3334 1067 95
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 45 to 49
Excellent Very good Good Fair Poor NA's
7051 12278 10611 4343 1815 112
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 50 to 54
Excellent Very good Good Fair Poor NA's
8545 15254 13761 6354 3120 139
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 55 to 59
Excellent Very good Good Fair Poor NA's
8500 16759 15394 7643 3998 197
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 60 to 64
Excellent Very good Good Fair Poor NA's
8283 16825 16266 8101 3955 229
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 65 to 69
Excellent Very good Good Fair Poor NA's
7479 15764 15600 7749 3200 205
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 70 to 74
Excellent Very good Good Fair Poor NA's
5491 11943 13125 6491 2721 196
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 75 to 79
Excellent Very good Good Fair Poor NA's
3320 8501 10128 5545 2426 173
----------------------------------------------------------------------------------------------------------------
hlthgrpq1$X_ageg5yr: Age 80 or older
Excellent Very good Good Fair Poor NA's
3697 10285 14400 8116 3695 322
这就是我卡住的地方。我已经尝试了几个小时试图到达这里:
Results obtained via spreadsheet.
感谢您的帮助。
(这是针对特定任务的,所以我只能使用 dplyr 和 ggplot2,所以,没有 reshape2 或 tidyr。)
【问题讨论】:
-
看看
dplyr动词group_by()和summarise()。