【发布时间】:2019-08-30 09:38:05
【问题描述】:
我想用 df2 中的值更新表 df1 中的值,只更新空值或零。 我可以用 data.table 或 dplyr 做到这一点,但我不能自动化所有列。
#data.table
df1 <- data.frame(x1=1:4, x2=c('a','b', NA, 'd'), x3=c(0,0,2,2), stringsAsFactors=FALSE)
df2 <- data.frame(x1=2:3, x2=c("zz", "qq"),x3=6:7, stringsAsFactors=FALSE)
require(data.table)
setDT(df1); setDT(df2)
df1[df2, on = .(x1), x2 := ifelse(is.na(x2) | x2 == 0 ,i.x2,x2)]
#dplyr
require(dplyr)
require(dplyr)
inner_join(df1,df2,by = c("x1" = "x1")) %>%
transmute(x1 = x1,
x2 =ifelse(is.na(x2.x) | x2.x == 0,x2.y,x2.x),
x3 =ifelse(is.na(x3.x) | x3.x == 0,x3.y,x3.x))
使用 dplyr 至少我可以手动添加列以获得预期的输出,问题是真实的数据框有这么多列。因此,我想遍历列来完成任务。
我尝试过的:
# dplyr + apply
inner_join(df1,df2,by = c("x1" = "x1")) %>%
cbind(.$x1,
apply(.[-1],2, function(cname) ifelse(is.na(cname) | cname == 'b',paste(cname, ".x", collapse = ""),paste(cname, ".y", collapse = "")))
)
# data.table with for
for (cname in names(df1)[!names(df1) %in% c("x1")]) {
df1[i = df2, on = .(x1), j = cname := {function (x) ifelse(is.na(x) | x == 'b',i.x,x)} (cname)
, with = FALSE]
}
# data.table + lapply
df1[i = df2, on = .(x1) ,names(df1)[!names(df1) %in% c("x1")] := lapply(df1[,names(df1)[!names(df1) %in% c("x1")],with=FALSE],
function(x) ifelse(is.na(x) | x == 0,df2.x,df1.x))]
【问题讨论】:
-
如果您共享预期输出会很好。如果无法更新,第 1 行是否应该保持为 0?
-
@sindri_baldur 我猜预期的输出是
inner_join(df1,df2,by = c("x1" = "x1")) %>% transmute(x1 = x1, x2 =ifelse(is.na(x2.x) | x2.x == 0,x2.y,x2.x), x3 =ifelse(is.na(x3.x) | x3.x == 0,x3.y,x3.x)),而无需为所有列组合手动执行。
标签: r dplyr data.table