【问题标题】:if-statement in R: "missing value" error despite existing valueR中的if语句:尽管存在值,但出现“缺失值”错误
【发布时间】:2019-04-26 16:18:58
【问题描述】:

当存在完全健康的值时,if 语句会返回“缺失值”错误。

我想编写一个简单的脚本来删除数据集中的行,如果其中一个条目包含某个标签。我在新列(包含MR)中分配了一个指示变量,然后使用for循环遍历行。如果指示符为 TRUE,则应删除该行。

到目前为止,指标分配正确,非常好。有趣的部分:在循环的 if 语句中,读取值似乎有问题,因为它返回 "Error in if (data$containsMR[i]) { : 需要 TRUE/FALSE 的缺失值”。

鉴于指标变量的正确(和完整)分配,这让我感到惊讶。更奇怪的是:删除了一些,但不是所有带有正面指示符的行(通过打印输出和 table(data$containsMR) 检查)。

现在真正奇怪的事情是:如果我再次运行相同的循环,它会删除其余的列(应该如此),但返回相同的错误。所以,理论上,我可以只运行循环两次,忽略错误并得到我想要的结果。这真的不是我正在做的事情的重点。

尝试的错误修正: - 将 for- 更改为 while 循环 - 将指标(和 if 语句)更改为整数 (0,1) - 在 RStudio 和 R 控制台中运行脚本 - 更改变量名称、包含/排除定义(例如,添加代理变量 row_number 而不是在一行中调用它。

# Script to delete all rows containing "MR" in column "EXAM_CODE"

# import file
data <- read.csv("C:\\ScriptingTest\\ablations 0114.csv")

# add indicator column
for (i in 1:nrow(data)){
    data$containsMR[i] <- ifelse(grepl("MR", toString(data$EXAM_CODE[i])), TRUE, FALSE)
}

# remove rows with positive indicator
row_number <- nrow(data)
for (i in 1:row_number){
    if (data$containsMR[i]){
        data <- data[-c(i),]
    }
}

# export csv
write.csv(data, "C:\\ScriptingTest\\export.csv")

【问题讨论】:

  • 想一想:您在循环期间更改了data 中的行数,但循环的长度是固定的。当data 有 100 行,但你的 for 循环仍需要达到 105 行时会发生什么?
  • 另外,您可以在一行中执行此操作,例如data[data$containsMR &gt; 0,].
  • ... 或者我猜是data[!data$containsMR,],因为这实际上是布尔值。
  • 删除和索引不是问题(据我所知);删除第 3 行将导致行为 1;2;4;[...]
  • 这绝对是个问题。您最终将索引一个不再存在的行,因为您已经超出了数据框的末尾。

标签: r if-statement


【解决方案1】:

为了说明问题是在您正在循环的 for 循环中修改对象的大小,请参见以下示例:

n <- nrow(mtcars)

for (i in 1:n){
  cat("\n mtcars currently has",nrow(mtcars),"rows;","accessing row",i)
  if (mtcars$cyl[i] == 4){
    mtcars <- mtcars[-i,]
  }
}

> mtcars currently has 32 rows; accessing row 1
 mtcars currently has 32 rows; accessing row 2
 mtcars currently has 32 rows; accessing row 3
 mtcars currently has 31 rows; accessing row 4
 mtcars currently has 31 rows; accessing row 5
 mtcars currently has 31 rows; accessing row 6
 mtcars currently has 31 rows; accessing row 7
 mtcars currently has 30 rows; accessing row 8
 mtcars currently has 30 rows; accessing row 9
 mtcars currently has 30 rows; accessing row 10
 mtcars currently has 30 rows; accessing row 11
 mtcars currently has 30 rows; accessing row 12
 mtcars currently has 30 rows; accessing row 13
 mtcars currently has 30 rows; accessing row 14
 mtcars currently has 30 rows; accessing row 15
 mtcars currently has 30 rows; accessing row 16
 mtcars currently has 29 rows; accessing row 17
 mtcars currently has 28 rows; accessing row 18
 mtcars currently has 28 rows; accessing row 19
 mtcars currently has 28 rows; accessing row 20
 mtcars currently has 28 rows; accessing row 21
 mtcars currently has 28 rows; accessing row 22
 mtcars currently has 27 rows; accessing row 23
 mtcars currently has 26 rows; accessing row 24
 mtcars currently has 26 rows; accessing row 25
 mtcars currently has 26 rows; accessing row 26
 mtcars currently has 25 rows; accessing row 27
Error in if (mtcars$cyl[i] == 4) { : 
  missing value where TRUE/FALSE needed

【讨论】:

  • 你说得对。我想这解释了我之前遇到的问题
【解决方案2】:

您可以将其简化为

newdata <-  data[!grepl("MR", data$EXAM_CODE),]

【讨论】:

  • 这很有效,而且比我的代码优雅得多。谢谢!到目前为止,我只使用 R 进行统计分析,但我想我应该赶上基本原理。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2019-02-07
  • 2015-04-13
  • 1970-01-01
  • 2023-02-16
  • 2014-01-08
  • 2020-05-28
  • 2018-09-18
相关资源
最近更新 更多