子集仅来自 R data.frame 的可能组合答案

【问题标题】：subset only possible combinations from an R data.frame子集仅来自 R data.frame 的可能组合
【发布时间】：2020-02-22 09:02:31
【问题描述】：

函数foo 强制subset 始终在任何子集中包含time 的所有值。

例如，如果我只想从 dat 中提取 prof == 1 子集，foo 也会将 time==1; time==2; time==3; time==4 添加到该子集。

但有时添加time 的一些值（在此示例中为time==1 和time==4）会导致subset 引发错误，因为此类子集没有数据。

我想知道如何在输出中过滤掉此类错误，即仅获取可能子设置的输出（此处为 time == 2 and 3）？

注意：数据是玩具，功能性解决方案值得赞赏。

# data.frame:
dat <- data.frame(time = c(1,3,2,4), prof = c(2,1,1,2)) 

# Function:
foo <- function(data, mod){

     tim <- sort(unique(data$time))

        s <- substitute(mod)
        G <- lapply(tim, function(x) bquote(.(s) & time == .(x)))

       lapply(1:length(G), function(i) subset(data, G[[i]]))
}
# EXAMPLE OF USE:
foo(dat, prof == 1) # Error in subset(data, G[[i]]) : 'subset' must be logical

# DESIRED OUTPUT:
 [[1]]
   time prof
 1    2    1

[[2]]
  time prof
1    3    1

【问题讨论】：

标签： r function loops dataframe subset

【解决方案1】：

您得到的错误是'subset' must be logical，这意味着subset 不知道如何处理bquote 生成的call 对象。将G[[i]] 放在eval 中应该可以正常工作：

dat <- data.frame(time = c(1,3,2,4), prof = c(2,1,1,2)) 
data <- dat; mod <- substitute(prof == 1)

foo <- function(data, mod){

    tim <- sort(unique(data$time))

    s <- substitute(mod)
    G <- lapply(tim, function(x) bquote(.(s) & time == .(x)))

    lapply(1:length(G), function(i) subset(data, eval(G[[i]]))) # <- Use `eval`
}

foo(dat, prof == 1)

输出：

[[1]]
[1] time prof
<0 Zeilen> (oder row.names mit Länge 0)

[[2]]
  time prof
3    2    1

[[3]]
  time prof
2    3    1

[[4]]
[1] time prof
<0 Zeilen> (oder row.names mit Länge 0)

<0 Zeilen> (oder row.names mit Länge 0) 只是说有 0 行。只需对输出列表进行子集化即可获得所需的数据帧。

我还应该指出，您的函数基本上与 dat[dat$prof == 1,] 执行相同的操作，因为您正在比较 prof 的每个 time 值（它返回一个数据框而不是一个列表，但这是一个相当小的详细）。我不确定你有什么计划，但我想我应该提一下。

【讨论】：

您可以在函数中再添加一行：res <- lapply(.); return(res[lapply(res, nrow) > 0])。
@jay.sf 完全放弃该功能并执行dat[dat$prof == 1,] 会更容易。但我认为 OP 可能有一些不是立即显而易见的计划，这就是为什么我只是指出他的功能中的错误而不是改进它。
是的，我的意思是通过这个添加，我们将实现 OP 所需的输出。

【解决方案2】：

对数据框进行子集化，按时间元素拆分：

    subset_df <- function(df, prof_no){


      split(df[df$prof == prof_no,], df[df$prof == prof_no, "time"]) 


}

应用：

subset_df(dat, 1)

使用的数据：

dat <- data.frame(time = c(1,3,2,4), prof = c(2,1,1,2))

【讨论】：