基于R中条件的二分变量答案

【问题标题】：Dichotomous variable based on conditionals in R基于R中条件的二分变量
【发布时间】：2021-10-08 23:35:24
【问题描述】：

我有一个类似的数据框：

date <- as.Date(c('2010-11-1','2010-11-2','2010-11-3','2010-11-4','2010-11-5','2010-11-6','2010-11-7','2010-11-8','2010-11-9','2010-11-10'))
precipitation <- c(0, 11, 12,3,0,0,0,7,9,10)
snowheight <- c(5,7,56,32, 11, 24, 70,8, 13, 11)
temperature <- c(-5, -2, 0, 0.4, -1, 5,6,4, 9, 10)
df <- data.frame(date, precipitation, snowheight, temperature)

我正在尝试根据以下条件为每个数据样本创建一个具有（0 和 1）的二分变量：

如果雪高 > 10，我们继续以下条件。否则将 NA 分配给二分变量。
如果降水 =
如果降水 > 0 和温度 > 0，我们分配 1。
如果降水 > 0 且温度

我发现 ifelse 与嵌套条件一起使用很容易，但是，由于条件部分重叠，这将不起作用。我想到的下一件事是使用 for 循环并检查每一行。这是我想出来的：

for (i in df){
  if (snowheight > 10 && rain > 0){
    if (temperature > 0){
      df$dicht <- 1
    } else {
      df$dicht <- 0
    }
  } else {
    df$dicht <- NA
  }
}

当我像这样运行代码时，我得到一个变量“dicht”，完全用“NA”填充。我想看看这是什么类型的错误，但我不知道如何解决它。似乎以这种方式编写的整个“dicht”变量都被分配了值而不是索引行。我试过这样：

for (i in df){
  if (snowheight > 10 && rain > 0){
    if (temperature > 0){
      df$dicht [i] <- 1
    } else {
      df$dicht [i] <- 0
    }
  } else {
    df$dicht [i] <- NA
  }
}

但是，我收到以下错误：

$<-.data.frame(*tmp*, "dicht", value = c(NA, NA, NA, NA, 中的错误：替换有14923行，数据有10

感谢任何帮助。提前致谢。

【问题讨论】：

标签： r for-loop if-statement conditional-statements

【解决方案1】：

我们可以使用向量化操作

df <- transform(df, dicht = +(snowheight > 10 & 
        precipitation > 0 & temperature > 0))
df$dicht[df$snowheight <=10] <- NA

OP 代码中的循环应该按行索引而不是“df”运行

df$dicht <- NA
for (i in seq_len(nrow(df))){
  if (df$snowheight[i] > 10 && df$precipitation[i] > 0){
    if (df$temperature[i] > 0){
      df$dicht [i] <- 1
    } else {
      df$dicht [i] <- 0
    }
  } else {
    df$dicht [i] <- NA
  }
}

【讨论】：

谢谢。两种解决方案：您和我的工作的升级。我必须更多地阅读“转换”功能以完全理解它是如何工作的，因为我目前还不确定。那是基础R对吗？谢谢
@ZorinIvanov 都是base R
谢谢。如果某些条件变量偶尔包含缺失值（NA），我应该如何调整代码？
@ZorinIvanov 在这种情况下，您可以使用!is.na(df$snoweight[i]) && 添加条件，以便如果元素为 NA，则返回 FALSE 而不是 NA
实际上，如果某些条件变量包含 NA，我更愿意在二分变量中接收 NA，以便将其排除在统计数据之外，就像雪高低于 10 一样。

【解决方案2】：

这看起来很适合 dplyr::case_when() 的解决方案

library(dplyr)

df %>% mutate(dicht = case_when(snowheight <=10 ~ NA_real_,
                                precipitation <= 0 ~ 0,
                                precipitation > 0 & temperature > 0 ~ 1,
                                precipitation > 0 & temperature <= 0 ~ 0))

【讨论】：

当我尝试您的代码时，我收到以下错误：“mutate() 输入 dicht 的问题。x 必须是逻辑向量，而不是双向量。我输入 dicht 是 @987654326 @。”