【问题标题】:R data.table rule with column name as a string列名作为字符串的 R data.table 规则
【发布时间】:2016-12-22 08:56:48
【问题描述】:

我有一个 data.table,它看起来像:

>DT
   ID Year Value ABC_1 ABC_2 ABC_3
1:  3 2015     5     0     1     0
2:  4 2015     2     1     0     1
3:  5 2015     1     0     1     1

我想为每个 ABC_... 做的是:

> unique(DT[Year == 2015 & ABC_1 == 1, .(Year = Year, ABC = ABC_1, N = .N, MEAN = mean(Value))])
   Year ABC N MEAN
1: 2015   1 1    2
> unique(DT[Year == 2015 & ABC_2 == 1, .(Year = Year, ABC = ABC_2, N = .N, MEAN = mean(Value))])
   Year ABC N MEAN
1: 2015   1 2    3
> unique(DT[Year == 2015 & ABC_3 == 1, .(Year = Year, ABC = ABC_3, N = .N, MEAN = mean(Value))])
   Year ABC N MEAN
1: 2015   1 2  1.5

我有超过 20 列 ABC_... 我想把这个语句放在一个 for 循环中。我的问题是选择/规则需要列名。它不适用于:

> abc_name <- names(DT)[names(DT) %like% 'ABC']
> abc_name
[1] "ABC_1" "ABC_2" "ABC_3"
> abc_row<- data.table(Year=0, ABC=0, N=0, MEAN=0)
> for (i in 1: length(abc_name)){
+   
+   temp_row <- unique(DT[Year == 2015 & abc_name[i] == 1, .(Year = Year, ABC = abc_name[i], N = .N, MEAN = mean(Value))])
+   abc_row <- rbind(abc_row, temp_row)
+ }
> abc_row
   Year ABC N MEAN
1:    0   0 0    0

temp_row 为空... 当我将 abc_name[I] 更改为 ABC_1 时,它可以工作:

> abc_name <- names(DT)[names(DT) %like% 'ABC']
> abc_name
[1] "ABC_1" "ABC_2" "ABC_3"
> abc_row<- data.table(Year=0, ABC=0, N=0, MEAN=0)
> for (i in 1: length(abc_name)){
+ 
+   temp_row <- unique(DT[Year == 2015 & ABC_1 == 1, .(Year = Year, ABC = ABC_1, N = .N, MEAN = mean(Value))])
+   abc_row <- rbind(abc_row, temp_row)
+ }
> abc_row
   Year ABC N MEAN
1:    0   0 0    0
2: 2015   1 1    2
3: 2015   1 1    2
4: 2015   1 1    2

如何在我的脚本工作的 for 循环中使用 abc_name? 我希望你能理解我的问题,有人可以帮助我。

【问题讨论】:

    标签: r dataframe data.table


    【解决方案1】:

    使用lapply遍历名称向量('abc_name'),应用OP帖子中的逻辑,获取getrbind列的值list元素。

    lst <- lapply(abc_name, function(nm)
              unique(DT[Year == 2015 & get(nm) == 1,
              .(Year = Year, ABC = get(nm), N = .N, MEAN = mean(Value))]))
    
    rbindlist(lst)
    #   Year ABC N MEAN
    #1: 2015   1 1  2.0
    #2: 2015   1 2  3.0
    #3: 2015   1 2  1.5
    

    或者另一种选择是melt将“宽”格式重塑为“长”格式,按“变量”和“年份”分组,并在“i”(value==1)中指定逻辑索引,汇总数据集

    melt(DT, measure = abc_name)[value==1, .(ABC=1, N= .N, 
         MEAN= mean(Value)), .(variable, Year)][, variable := NULL][]
    #   Year ABC N MEAN
    #1: 2015   1 1  2.0
    #2: 2015   1 2  3.0
    #3: 2015   1 2  1.5
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2019-02-16
      • 2018-10-29
      • 2021-03-26
      • 1970-01-01
      • 2021-12-07
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多