【问题标题】:Dynamically subsetting rowns in R data.table [duplicate]R data.table中的动态子集行[重复]
【发布时间】:2020-06-27 03:52:22
【问题描述】:

我有一个数据集,其列数取决于输入参数。例如,如果输入参数K=3 我将有members1members2members3 列。我将列名存储在一个向量中:memberCols <- c(paste0("members" , 1:K))

        policy quotationYear   members1    members2             member1FLAG           member2FLAG 
1: G000809-000          2016      -0.83        0.08                       0                     0
2: G002417-000          2016      -0.62       -0.38    growth out of bounds                     0
3: G005213-000          2016      -0.66       -0.56    growth out of bounds  growth out of bounds
4: G001719-000          2017      19.00        0.00    growth out of bounds                     0
5: G002337-000          2017      -0.86       -0.21                       0                     0
6: G002337-000          2017       6.67        0.25    growth out of bounds                     0

我有兴趣只查看其中的一些行。具体来说,我想查看memberCols 中包含“越界”的那些列。到目前为止,我知道如何动态地对列进行子集化。

growthCols<- c(paste0("growthMembers" , 1:K, "FLAG"))
    BToutliersGrowth <- BTplan[,.SD, .SDcols = c(growthCols, memberCols)]

如何对data.table 进行子集化处理,以便只保留growthCols 中包含“超出范围的增长”的行?

数据:

data: structure(list(policy = c("G000809-000", "G002417-000", "G005213-000", 
"G001719-000", "G002337-000", "G002337-000"), quotationYear = c(2016, 
2016, 2016, 2017, 2017, 2017), members1 = c(-0.83, 
-0.62, -0.66, 19, -0.86, 6.67), members2 = c(0.08, -0.38, 
-0.56, 0, -0.21, 0.25), growthMembers1FLAG = c("growth out of bounds", 
"growth out of bounds", "growth out of bounds", "growth out of bounds", 
"growth out of bounds", "growth out of bounds"), growthMembers2FLAG = c("0", 
"0", "growth out of bounds", "0", "0", "0")), class = c("data.table", 
"data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x7fcec80164e0>)

【问题讨论】:

    标签: r


    【解决方案1】:

    你可以试试:

    library(data.table)
    data[data[,rowSums(.SD == 'growth out of bounds') > 0,  .SDcols = growthCols]]
    

    【讨论】:

    • 我担心这工作正常,尽管有超出范围的观察,但我得到 0 行,是否有替代方案?
    • 对于您共享的数据,我选择了所有行,因为所有行都有“增长超出范围”。你得到 0 行的数据?
    • 我在错误的列上进行子集化,抱歉!
    猜你喜欢
    • 2020-10-12
    • 2017-04-30
    • 1970-01-01
    • 2015-03-21
    • 1970-01-01
    • 2013-01-08
    • 2016-04-20
    • 2020-04-16
    相关资源
    最近更新 更多