响应 OP 在 cmets 中的请求...
library(tidyverse)
# Test data. 3 questions just to demonstrate the principle.
d <- tibble(
Participant=1:10,
Q1=factor(sample(1:3, 10, TRUE), labels=c("Do not Agree (1)", "Somewhat Agree (2)","Completely Agree (3)")),
Q2=factor(sample(1:3, 10, TRUE), labels=c("Do not Agree (1)", "Somewhat Agree (2)","Completely Agree (3)")),
Q3=factor(sample(1:3, 10, TRUE), labels=c("Do not Agree (1)", "Somewhat Agree (2)","Completely Agree (3)"))
)
重新编码因素
# Recode untidy data
d %>% mutate(
across(
starts_with("Q"),
function(x) factor(as.numeric(x) == 3, labels=c("Do not completely agree (1&2)", "Completely agree (3)"))
)
)
# Tidy the data
dTidy <- d %>%
pivot_longer(
cols=starts_with("Q"),
values_to="Response",
names_to="Question"
)
dTidy
# Recode tidy data
dTidy %>%
mutate(
Response=factor(
as.numeric(Response) == 3,
labels=c("Do not completely agree (1&2)", "Completely agree (3)")
)
)
到目前为止没有太大区别。当我们尝试用它做一些事情时,整洁数据的好处变得更加明显。作为一个简单的示例,绘制问题的直方图。杂乱无章的数据并不是特别适合。这里有一个简单的总结:
# Plot untidy data
doPlots <- function(data) {
print(data %>% ggplot() + geom_bar(aes(x=Q1)))
print(data %>% ggplot() + geom_bar(aes(x=Q2)))
print(data %>% ggplot() + geom_bar(aes(x=Q3)))
}
d %>% doPlots()
其他任何事情都很尴尬。有了整齐的数据,就很简单了:
# Plot tidy data
dTidy %>%
ggplot() +
geom_bar(aes(x=Response)) +
facet_grid(rows=vars(Question))
# Or
dTidy %>%
ggplot() +
geom_bar(aes(x=Response, fill=Question))
此外,假设到达了不同的数据集,其中的问题比原始数据集更多。
# Now add another Question
d <- d %>% mutate(Q4=factor(sample(1:3, 10, TRUE), labels=c("Do not Agree (1)", "Somewhat Agree (2)", "Completely Agree (3)")))
dTidy <- dTidy %>%
bind_rows(
tibble(
Participant=1:10,
Question="Q4",
Response=factor(sample(1:3, 10, TRUE), labels=c("Do not Agree (1)", "Somewhat Agree (2)", "Completely Agree (3)"))
)
)
doPlot 函数需要重写:它忽略了 Q4。
d %>% doPlot()
但是整洁的代码很健壮,不需要修改
dTidy %>%
ggplot() +
geom_bar(aes(x=Response)) +
facet_grid(rows=vars(Question))
在我看来,使用整洁的数据意味着你的代码是