r自定义函数中的局部变量问题答案

【问题标题】：Issue with local variables in r custom functionr自定义函数中的局部变量问题
【发布时间】：2018-06-28 13:58:07
【问题描述】：

我有一个数据集

>view(interval)
#   V1 V2 V3 ID
# 1 NA 1  2  1
# 2 2  2  3  2
# 3 3  NA 1  3
# 4 4  2  2  4
# 5 NA 5  1  5

>dput(interval)
structure(list(V1 = c(NA, 2, 3, 4, NA),
V2 = c(1, 2, NA, 2, 5),
V3 = c(2, 3, 1, 2, 1), ID = 1:5), row.names = c(NA, -5L), class = "data.frame")

我想为每一行提取前一个非 NA 值（或下一个，如果 NA 在第一行），并将其作为局部变量存储在自定义函数中，因为我必须执行其他操作基于此值的每一行（我应用该函数的每一行都应该改变）。我写了这个函数来打印局部变量，但是当我应用它时，输出不是我想要的

myFunction<- function(x){
              position <- as.data.frame(which(is.na(interval), arr.ind=TRUE))
              tempVar <- ifelse(interval$ID == 1, interval[position$row+1,
                         position$col], interval[position$row-1, position$col])
              return(tempVar)
}

我期待得到这样的东西

# [1]    2
# [2]    2
# [3]    4

但我得到了一些相当混乱的东西。

【问题讨论】：

您能否提供interval 使用dput 功能，以便SO 社区更容易帮助您？
我不明白您的预期输出如何可用。如果函数返回（在这种情况下）三个值的向量，你怎么知道如何处理它们而不重新确定它们引用的索引？你怎么知道第一个2 应该在第一行引用NA，第二个2 应该在第三行引用，等等？我想我了解您的要求，但不了解您需要或计划用它做什么。
@r2evans 好吧，我对那个特定的输出不感兴趣。我想为我的函数的每次迭代显示三个所需局部变量的值
当你有两个 NA 堆叠时会发生什么？
@r2evans 在实际数据集中没有连续的 NA，无论如何在这种情况下我可以编辑我的函数以获得第一个可用的非 NA 值。

标签： r function dataframe custom-function

【解决方案1】：

这是第 1 次尝试：

dat <- read.table(header=TRUE, text='
V1 V2 V3 ID
NA 1  2  1
2  2  3  2
3  NA 1  3
4  2  2  4
NA 5  1  5')
myfunc1 <- function(x) {
  ind <- which(is.na(x), arr.ind=TRUE)
  # since it appears you want them in row-first sorted order
  ind <- ind[order(ind[,1], ind[,2]),]
  # catch first-row NA
  ind[,1] <- ifelse(ind[,1] == 1L, 2L, ind[,1] - 1L)
  x[ind]
}
myfunc1(dat)
# [1] 2 2 4

问题是当有第二个“堆叠”NA:

dat2 <- dat
dat2[2,1] <- NA
dat2
#   V1 V2 V3 ID
# 1 NA  1  2  1
# 2 NA  2  3  2
# 3  3 NA  1  3
# 4  4  2  2  4
# 5 NA  5  1  5
myfunc1(dat2)
# [1] NA NA  2  4

对此的一种修复/保护措施是使用zoo::na.locf，它采用“last observation carried forward”。由于顶行是一个特殊情况，我们做了两次，第二次反过来。这为我们提供了“列中的下一个非NA 值（向上或向下，取决于）。

library(zoo)
myfunc2 <- function(x) {
  ind <- which(is.na(x), arr.ind=TRUE)
  # since it appears you want them in row-first sorted order
  ind <- ind[order(ind[,1], ind[,2]),]
  # this is to guard against stacked NA
  x <- apply(x, 2, zoo::na.locf, na.rm = FALSE)
  # this special-case is when there are one or more NAs at the top of a column
  x <- apply(x, 2, zoo::na.locf, fromLast = TRUE, na.rm = FALSE)
  x[ind]
}
myfunc2(dat2)
# [1] 3 3 2 4

【讨论】：