【发布时间】:2017-11-08 23:32:31
【问题描述】:
df <- data.frame(id = c(1, 2, 3, 3, 3, 4), gender = c("Male", "Female", "Both", "Male", "Female", "Female"))
ids <- unique(df$id)
> df
id gender
1 1 Male
2 2 Female
3 3 Both
4 3 Male
5 3 Female
6 4 Female
对于每个唯一的id,我想确保如果对应的genders 是Both、Male 和Female,那么我需要删除与Both 对应的行。换句话说,我想要的输出是:
> df
id gender
1 1 Male
2 2 Female
3 3 Male
4 3 Female
5 4 Female
我试过写一个循环:
将
df子集id并将每个子集存储到名为sub的列表中在每个
sub中,检查性别是否包含“Both”、“Male”和“Female`”如果是这样,删除性别=“Both”的行
重新组合data.frame
但是,下面的代码实际上并不能正常工作,而且非常笨拙...我想知道在dplyr 中使用group_by 是否有更简单的方法?
sub <- list()
for(i in 1:length(ids)){
sub[[i]] <- subset(df, id %in% ids[i])
if(all(grepl(sub[[i]]$gender, c("Both", "Male", "Female")))){
sub[[i]] <- sub[[i]][-which(sub[[i]]$gender == "Both"), ]
}else sub[[i]] = sub[[i]]
}
【问题讨论】:
-
id的性别是Both,那id是否总是同时拥有Male和Female? -
不,不一定。