R中基于多个条件的两个数据帧的匹配值答案

【问题标题】：Matching values of two data frames based on multiple conditions in RR中基于多个条件的两个数据帧的匹配值
【发布时间】：2021-12-25 09:26:01
【问题描述】：

我有两个数据集

cycle <- c(160, 160, 150, 158, 180) 
split1 <- c(2, 2,4, 6, 8) 
split2 <- c(10,10, 12, 14, 16) 
df1 <- data.frame(cycle, split1, split2) 
df1
  cycle split1 split2
1   160      2     10
2   160      2     10
3   150      4     12
4   158      6     14
5   180      8     16

cycle <- c(160,150,190,180,161,150,140,179)
split1 <- c(2,4,12,8,2,4,32,8)
split2 <- c(10, 12, 18, 16, 10, 12, 21, 16)
df2 <- data.frame(cycle, split1, split2)
df2
  cycle split1 split2
1   160      2     10
2   150      4     12
3   190     12     18
4   180      8     16
5   161      2     10
6   150      4     12
7   140     32     21
8   179      8     16

我想匹配 df1 和 df2 的值，并根据两个条件标记 df2 值：

1- 如果所有三列（即循环、拆分 1 和拆分 2）的值完全相同，则分配标签为“相同”的行，否则为“不同”。

2- 如果仅循环值与 df1 和 df2 的差值为 +1 或 -1，并且其余行值相同，则分配带有标签“相同”否则为“不同”的行。

输出应该是这样的

  cycle split1 split2      Type
1   160      2     10      Same
2   150      4     12      Same
3   190     12     18 Different
4   180      8     16      Same
5   161      2     10      Same
6   150      4     12      Same
7   140     32     21 Different
8   179      8     16      Same

我成功实现了以下第一个条件

df1<- df1 %>% mutate(key = paste0(cycle,split1, split2, "_"))
df2<- df2 %>% mutate(key = paste0(cycle,split1, split2, "_"))
df2 %>% mutate(Type = ifelse(df2$key %in% df1$key, 'same', 'different'))%>%
  select(-key)

  cycle split1 split2      Type
1   160      2     10      same
2   150      4     12      same
3   190     12     18 different
4   180      8     16      same
5   161      2     10 different
6   150      4     12      same
7   140     32     21 different
8   179      8     16 different

但在实现第二个时遇到问题。

知道如何有效地做到这一点吗？

提前谢谢你。

【问题讨论】：

标签： r dataframe if-statement dplyr tidyverse

【解决方案1】：

基于你原来的df1和df2（没有生成新的列key），你可以使用

df2 %>% 
  mutate(rn = row_number()) %>% 
  left_join(df1, by = c("split1", "split2"), suffix = c("", ".y")) %>% 
  mutate(
    type = coalesce(
      ifelse(abs(cycle - cycle.y) <= 1, "same", "different"), 
      "different")
    ) %>% 
  group_by(rn) %>% 
  distinct() %>% 
  ungroup() %>% 
  select(-rn, -cycle.y)

# A tibble: 8 x 4
  cycle split1 split2 type     
  <dbl>  <dbl>  <dbl> <chr>    
1   160      2     10 same     
2   150      4     12 same     
3   190     12     18 different
4   180      8     16 same     
5   161      2     10 same     
6   150      4     12 same     
7   140     32     21 different
8   179      8     16 same

【讨论】：

有一个问题。在实际数据中，df1 具有重复值，这些值在 df2 中产生我不需要的重复值。我希望 df2 文件的大小相同。
你能举一个这个问题的例子并解释如何处理它吗？
例如如果 df1 是：cycle
不确定这是否是您要查找的内容：尝试将上面代码中的 df1 替换为 df1 %>% distinct()。
它给出错误“在distinct() 中添加计算列时出现问题。x mutate() 输入..1 出现问题。”