逻辑比较数据框中包含 NA 的列？答案

【问题标题】：Logically compare columns in a data frame that contain NA?逻辑比较数据框中包含 NA 的列？
【发布时间】：2021-09-13 14:53:37
【问题描述】：

对于 R 来说相对较新，我今天发现 1 == NA 返回 NA 而不是预期的 FALSE。

我在this question 寻求帮助，这很好，但面向向量，我正在使用数据帧。

这是我正在使用的简化示例：

library(tidyverse)

df_example <- expand_grid(shpmt = c(1:3), stoptype = c("P", "D"))
df_example$metgoal.ref <- c(1,NA,0,0,1,NA)
df_example$metgoal.tri <- c(1,NA,0,1,NA,1)

> df_example

# A tibble: 6 x 4
  shpmt stoptype metgoal.ref metgoal.tri
  <int> <chr>          <dbl>       <dbl>
1     1 P                  1           1
2     1 D                 NA          NA
3     2 P                  0           0
4     2 D                  0           1
5     3 P                  1          NA
6     3 D                 NA           1

我的目标是查看 .ref 和 .tri 不相同的每个实例，包括 NA。我首先尝试了一个简单的（我认为的）不等式：

> filter(df_example, metgoal.ref != metgoal.tri)  #Returns only inequalities without NAs.
# A tibble: 1 x 4
  shpmt stoptype metgoal.ref metgoal.tri
  <int> <chr>          <dbl>       <dbl>
1     2 D                  0           1

最初，我没有意识到我错过了NA，但我现在知道我可以使用is.na() 找到它们，这很重要，因为这种结构允许我在NA >两列中的任何一个（这就是为什么that question 对我的向量没有太大帮助）。有点不利的是，它还提供了两个列都是 NA 的实例（我主要关心它们是否不同）：

> filter(df_example, is.na(metgoal.ref != metgoal.tri))

# A tibble: 3 x 4
  shpmt stoptype metgoal.ref metgoal.tri
  <int> <chr>          <dbl>       <dbl>
1     1 D                 NA          NA   #Not ideal -- I want only columns that disagree.
2     3 P                  1          NA
3     3 D                 NA           1

如果我把这两个结构放在一起，我可以得到我想要的，除了第 1 行中的双 NA 列：

> filter(df_example, is.na(metgoal.ref != metgoal.tri) | (metgoal.ref != metgoal.tri))

# A tibble: 4 x 4
  shpmt stoptype metgoal.ref metgoal.tri
  <int> <chr>          <dbl>       <dbl>
1     1 D                 NA          NA   #Still not ideal
2     2 D                  0           1
3     3 P                  1          NA
4     3 D                 NA           1

但是对于我认为只是一个不等式的情况，要输入和维护的内容很多，对于其他列集，我还有很多事情要做，而且我还有其他条件要添加：

> filter(df_example, (is.na(metgoal.ref != metgoal.tri) | (metgoal.ref != metgoal.tri)) & stoptype == "D")  

#Added another condition, increasing complexity.

# A tibble: 3 x 4
  shpmt stoptype metgoal.ref metgoal.tri
  <int> <chr>          <dbl>       <dbl>
1     1 D                 NA          NA
2     2 D                  0           1
3     3 D                 NA           1

我认为identical() 函数可能会有所帮助，但如果可以，那我就用错了，需要帮助：

> filter(df_example, !identical(df_example$metgoal.ref, df_example$metgoal.tri))  

#This does not work at all -- probably using it wrong.

# A tibble: 6 x 4
  shpmt stoptype metgoal.ref metgoal.tri
  <int> <chr>          <dbl>       <dbl>
1     1 P                  1           1
2     1 D                 NA          NA
3     2 P                  0           0
4     2 D                  0           1
5     3 P                  1          NA
6     3 D                 NA           1

我看到的针对这种情况的其他策略是将NA 替换为可以以我最初尝试的方式测试不平等的东西：

df_example2 <- df_example %>% 
  replace_na(list(metgoal.ref = 9, metgoal.tri = 9))  #Arbitrarily choosing 9 as replacement value

filter(df_example2, metgoal.ref != metgoal.tri)

# A tibble: 3 x 4
  shpmt stoptype metgoal.ref metgoal.tri
  <int> <chr>          <dbl>       <dbl>
1     2 D                  0           1
2     3 P                  1           9
3     3 D                  9           1

我怀疑这最后一个解决方案是我能得到的最好的解决方案，但我希望对metgoal.* 列进行聚合和汇总统计，它们应该保留NA 可能是合适的。我可以在转换之前回到df_example，但我一直认为有更好的解决方案，它会改善我的学习。

提前感谢您提供的任何建议。

【问题讨论】：

标签： r dataframe na logical-operators

【解决方案1】：

这似乎有效。请让我知道你的想法！

mapply(identical, df_example$metgoal.ref, df_example$metgoal.tri)

然后您可以使用它来过滤您的原始数据。我添加了!，因为您对这些字段不相同的行感兴趣。

整理起来可能是这样的

filter(df_example, !mapply(identical, df_example$metgoal.ref, df_example$metgoal.tri))

或在基地

df_example[!mapply(identical, df_example$metgoal.ref, df_example$metgoal.tri),]

【讨论】：

我今天早些时候确实尝试过，它在 filter() 函数中运行良好，甚至与附加条件结合使用。在我接受这个作为答案之前，我倾向于等待几天看看是否出现任何其他答案，但是你可以编辑你的答案以使用过滤器结构显示它吗？ filter(df_example, !mapply(identical, df_example$metgoal.ref, df_example$metgoal.tri)) 正是我想要达到的结果。
对于其他通过这种方式的人，我处理其他符合要求的标准的方式是filter(df_example, !mapply(identical, df_example$metgoal.ref, df_example$metgoal.tri) & stoptype == "D")。
我根据您的建议更新了我的答案。谢谢
感谢您的回答。非常感谢您的帮助。