R data.table中的动态子集行[重复]答案

【问题标题】：Dynamically subsetting rowns in R data.table [duplicate]R data.table中的动态子集行[重复]
【发布时间】：2020-06-27 03:52:22
【问题描述】：

我有一个数据集，其列数取决于输入参数。例如，如果输入参数K=3 我将有members1、members2 和members3 列。我将列名存储在一个向量中：memberCols <- c(paste0("members" , 1:K))。

        policy quotationYear   members1    members2             member1FLAG           member2FLAG 
1: G000809-000          2016      -0.83        0.08                       0                     0
2: G002417-000          2016      -0.62       -0.38    growth out of bounds                     0
3: G005213-000          2016      -0.66       -0.56    growth out of bounds  growth out of bounds
4: G001719-000          2017      19.00        0.00    growth out of bounds                     0
5: G002337-000          2017      -0.86       -0.21                       0                     0
6: G002337-000          2017       6.67        0.25    growth out of bounds                     0

我有兴趣只查看其中的一些行。具体来说，我想查看memberCols 中包含“越界”的那些列。到目前为止，我知道如何动态地对列进行子集化。

growthCols<- c(paste0("growthMembers" , 1:K, "FLAG"))
    BToutliersGrowth <- BTplan[,.SD, .SDcols = c(growthCols, memberCols)]

如何对data.table 进行子集化处理，以便只保留growthCols 中包含“超出范围的增长”的行？

数据：

data: structure(list(policy = c("G000809-000", "G002417-000", "G005213-000", 
"G001719-000", "G002337-000", "G002337-000"), quotationYear = c(2016, 
2016, 2016, 2017, 2017, 2017), members1 = c(-0.83, 
-0.62, -0.66, 19, -0.86, 6.67), members2 = c(0.08, -0.38, 
-0.56, 0, -0.21, 0.25), growthMembers1FLAG = c("growth out of bounds", 
"growth out of bounds", "growth out of bounds", "growth out of bounds", 
"growth out of bounds", "growth out of bounds"), growthMembers2FLAG = c("0", 
"0", "growth out of bounds", "0", "0", "0")), class = c("data.table", 
"data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x7fcec80164e0>)

【问题讨论】：

标签： r

【解决方案1】：

你可以试试：

library(data.table)
data[data[,rowSums(.SD == 'growth out of bounds') > 0,  .SDcols = growthCols]]

【讨论】：

我担心这工作正常，尽管有超出范围的观察，但我得到 0 行，是否有替代方案？
对于您共享的数据，我选择了所有行，因为所有行都有“增长超出范围”。你得到 0 行的数据？
我在错误的列上进行子集化，抱歉！