【发布时间】:2020-08-08 03:02:14
【问题描述】:
我有一个数据集,其中包含“选择尽可能多的应用”问题的答案,每个可能的答案都在单独的列中。所以,假设我们的问题是“你可以接受什么颜色的衬衫?”它看起来像这样:
id Q3_Red Q3_Blue Q3_Green Q3_Purple
9
8 Green Purple
7 Green
6 Red
5 Purple
4 Blue
3 Blue Purple
2 Red Blue Green
1 Red Purple
10 Red Purple
您可以使用以下方法将其制成实际的数据框:
tmp <- data.frame("id" = c(009,008,007,006,005,004,003,002,001,010), "Q3_Red" = c("","","","Red","","","","Red","Red","Red"), "Q3_Blue" = c("","","","","","Blue","Blue","Blue","",""),
"Q3_Green" = c("","Green","Green","","","","","Green","",""),
"Q3_Purple" = c("","Purple","","","Purple","","Purple","","Purple","Purple")
)
我想用每个答案的计数来总结它,例如。
Red 4
Blue 3
Green 3
Purple 5
我可以用tmp %>% count(Q3_Red) 之类的东西来计算每个人的数量,并将它们组织到自己的数据框中,但似乎必须有一种方法可以一举使用重塑功能来做到这一点。我看过gather() 和spread(),但我不知道如何将tidyr 与count() 结合起来。
【问题讨论】:
-
快速而肮脏的方式是
colSums(tmp[,-1] != ""),但显然更正式的tidyverse方式是有人回答的 -
@BenToh 谢谢。我绝对想用这个项目来掌握 Tidyverse。