【发布时间】:2020-06-25 10:04:30
【问题描述】:
我有以下数据集:
df <- tribble(
~id, ~name, ~day_1, ~day_2, ~day_3, ~day_4, ~rank,
"101", "a", 5, 2, 1, 8, '1',
"202", "b", 8, 4, 5, 5, '2',
"303", "c", 10, 6, 9, 6, '3',
"404", "d", 12, 8, 5, 7, '4',
"505", "e", 14, 10, 7, 9, '5',
"607", "f", 5, 2, 1, 8, '6',
"707", "g", 8, 4, 5, 5, '7',
"808", "h", 10, 6, 9, 6, '8',
"909", "k", 12, 8, 5, 7, '9',
"1009", "l", 14, 10, 7, 9, '10',
)
感谢@Edward 创建了top 变量并按top 对数据进行分组后,我采用了以天开头的每一列的值的中值。代码如下:
df %>%
mutate(top = ifelse(rank <= 1, 1,
ifelse(rank <= 3, 3,
ifelse(rank <= 5, 5,
ifelse(rank <= 7, 7,
ifelse(rank <= 8, 8, 10)))))) %>%
group_by(top) %>%
summarize(day_1 = median(as.numeric(day_1), na.rm = TRUE),
day_2 = median(as.numeric(day_2), na.rm = TRUE),
day_3 = median(as.numeric(day_3), na.rm = TRUE),
day_4 = median(as.numeric(day_4), na.rm = TRUE))
结果如下:
# A tibble: 6 x 5
top day_1 day_2 day_3 day_4
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 5 2 1 8
2 3 10 6 7 6
3 5 13 9 6 8
4 7 6.5 3 3 6.5
5 8 10 6 9 6
6 10 12 8 5 7
但是,由于我的真实数据集中有近 40 个以 day 开头的列,因此我想使用一个函数来更有效地执行此操作,而不是像 summarize(day_1 = median(as.numeric(day_1), na.rm = TRUE) 这样编写所有列名。
对此有什么想法吗?
【问题讨论】:
-
在
dplyr的开发版本中查看summarise_at或across;还要检查stackoverflow.com/questions/9723208/… -
感谢您的建议。我添加了这个:``` summarise_at(vars(starts_with('day')), median) ``` 但它给出了以下错误:期望一个单边公式、一个函数或一个函数名。 @arg0naut91