我们可以使用summarise_all(假设每一列的每个“日期”只有一个非NA元素)
library(dplyr)
df %>%
group_by(Date) %>%
summarise_all(na.omit)
如果我们有多个非 NA 元素,并且在某些情况下只有 NA,请创建 list 列或 paste
df %>%
group_by(Date) %>%
summarise_at(vars(-group_cols()), ~ list(if(all(is.na(.))) .[n() + 1] else .[!is.na(.)]))
# A tibble: 3 x 4
# Date AAPL MSFT NASDAQ
# <chr> <list> <list> <list>
#1 1.1.19 <chr [1]> <chr [1]> <chr [1]>
#2 2.1.19 <chr [1]> <chr [1]> <chr [2]>
#3 3.1.19 <chr [1]> <chr [1]> <chr [2]>
另外,如果某些元素是重复的,那么我们采用unique 并假设每组没有完全不同的元素
df %>%
group_by(Date) %>%
summarise_at(vars(-group_cols()), ~ if(all(is.na(.))) .[n() + 1] else unique(.[!is.na(.)]))
# A tibble: 3 x 4
# Date AAPL MSFT NASDAQ
# <chr> <chr> <chr> <chr>
#1 1.1.19 <NA> <NA> <NA>
#2 2.1.19 2% 4% 5%
#3 3.1.19 3% 5% 6%
或者先做distinct再做分组操作
distinct(df) %>%
group_by(Date) %>%
summarise_at(vars(-group_cols()), ~ .[!is.na(.)][1])
# A tibble: 3 x 4
# Date AAPL MSFT NASDAQ
# <chr> <chr> <chr> <chr>
#1 1.1.19 <NA> <NA> <NA>
#2 2.1.19 2% 4% 5%
#3 3.1.19 3% 5% 6%
或者在dplyr的devel版本中,我们可以使用condense
df %>%
group_by(Date) %>%
condense(data = across(everything(), ~ .[!is.na(.)]))
# A tibble: 3 x 2
# Rowwise: Date
# Date data
# <chr> <list>
#1 1.1.19 <tibble [0 × 3]>
#2 2.1.19 <tibble [2 × 3]>
#3 3.1.19 <tibble [2 × 3]>
数据
df <- structure(list(Date = c("1.1.19", "2.1.19", "3.1.19", "1.1.19",
"2.1.19", "3.1.19"), AAPL = c(NA, "2%", "3%", NA, NA, NA), MSFT = c(NA,
NA, NA, NA, "4%", "5%"), NASDAQ = c(NA, "5%", "6%", NA, "5%",
"6%")), class = "data.frame", row.names = c(NA, -6L))