【问题标题】：Matching data replacement in RR中的匹配数据替换
【发布时间】：2019-09-07 14:59:56
【问题描述】：

我有两个具有相似维度和相似列名的数据集。目标是检查 NA 值是否存在于其中一个数据集中，并替换为另一个数据集中的相应值，如下例所示。

我曾尝试运行 for 循环来解决问题，但没有奏效并且惨遭失败。

df 是使用 NA 创建的新数据框

loop =  for (a in 1:nrow(data1)) {
       for (b in 1:ncol(data1)) {
       for (c in 1:nrow(data2)) {
       for (d in 1:ncol(data2)) {
       for (x in 1:nrow(df))    {
       for (y in 1:ncol(df))    {
       df[x,y]<- ifelse(data1[a,b] != "NA", data1[a,b], data2[c,d])
       return(df)`enter code here`
}
}    
}   
}  
} 
}

示例

# The first data frame 
structure(list(age = c(23, 22, 21, 20), gender = c("M", "F", 
NA, "F")), row.names = c(NA, -4L), class = c("tbl_df", "tbl", 
"data.frame"))
#     age gender
# 1    23 M     
# 2    22 F     
# 3    21 NA    
# 4    20 F     
# The second data frame 
structure(list(age = c(23, 22, 21, 20), gender = c("M", "F", 
"M", "F")), row.names = c(NA, -4L), class = c("tbl_df", "tbl", 
"data.frame"))
#     age gender
# 1    23 M     
# 2    22 F     
# 3    21 M     
# 4    20 F

期望的输出

Age   Gender
23    M
22    F
21    M
20    F

【问题讨论】：

标签： r merge match

【解决方案1】：

你可以试试这个：

df1 <- tibble(age = c(23,22,21,20), 
             gender = c("M", "F", NA, "F"))

# -------------------------------------------------------------------------
#> df1
# # A tibble: 4 x 2
#     age gender
#     <dbl> <chr> 
# 1    23 M     
# 2    22 F     
# 3    21 NA    
# 4    20 F     

# -------------------------------------------------------------------------

df2 <- tibble(age = c(23,22,21,20), 
             gender = c("M", "F", "M", "F"))

# -------------------------------------------------------------------------
#> df2
# # A tibble: 4 x 2
#     age gender
#     <dbl> <chr> 
# 1    23 M     
# 2    22 F     
# 3    21 M     
# 4    20 F     
# -------------------------------------------------------------------------
# get the na in df1 of gender var
df1.na <- is.na(df1$gender)
#> df1.na
# [1] FALSE FALSE  TRUE FALSE
# -------------------------------------------------------------------------


# use the values in df2 to replace na in df1 (Note that this is index based)
df1$gender[df1.na] <- df2$gender[df1.na]
df1

# -------------------------------------------------------------------------
#> df1
# A tibble: 4 x 2
#     age gender
#     <dbl> <chr> 
# 1    23 M     
# 2    22 F     
# 3    21 M     
# 4    20 F     
# -------------------------------------------------------------------------

【讨论】：

感谢您的回答。但是，如果我有 100 列，那么 ?????

【解决方案2】：

这可以使用rqdatatable 库中的natural_join 函数来完成。该函数确实需要一个索引来合并，所以我们需要创建一个。

创建一个可重复的示例将帮助其他人帮助您。在这里，我创建了两个简单的数据框，它们应该涵盖您的问题的大多数情况。

# Create example data
tbl1 <- 
  data.frame(
    w = c(1, 2, 3, 4),
    x = c(1, 2, 3, NA),
    y = c(1, 2, 3, 4),
    z = c(1, NA, NA, NA)
  )

tbl2 <-
  data.frame(
    w = c(9, 9, 9, 9), # check value doesnt overwrite value,
    x = c(1, 2, 3, 4), # check na gets filled in
    y = c(1, 2, 3, NA), # check NA doesnt overwrite value
    z = c(9, NA, NA, NA) # check NA in both stays NA
  )

# Create join index 
tbl1$indx <- 1:nrow(tbl1)
tbl2$indx <- 1:nrow(tbl2) 

# Use natural_join 
library("rqdatatable")
natural_join(tbl1, tbl2, by = "indx")

【讨论】：

感谢您的编辑。但是，此代码不适用于我的数据。它也只用一个值替换了几列。你能解释一下为什么会这样吗？？
抱歉，您对问题的解释不够清楚，我无法帮助您。如果您提供的示例数据不起作用，我可以再看看。