【问题标题】:R: function or similar to sum up number of non-NA values for columns that contain specific characters in large data set [duplicate]R:函数或类似函数,用于汇总大型数据集中包含特定字符的列的非 NA 值的数量 [重复]
【发布时间】:2021-03-15 11:57:56
【问题描述】:

我有一个大型数据集 (907 x 1855)。我需要计算每位患者接受了多少次随访。后续列包含12NA,后续可以定义为特定列是!is.na()

最多有 20 次跟进。如您所见,每个跟进都添加了_vX作为后缀,其中x对应于跟进的数量。

因此,follow-up nr 20 具有非常不方便的 RedCapautogenerated 列名称 p$fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17_v18_v19_v20

> head(p)
  fu_location fu_location_v2 fu_location_v2_v3 fu_location_v2_v3_v4    ...
1           1              1                 1                    1    ...
2           2              2                 1                    2    ...
3           1              1                 1                    2    ...
4           2              2                 2                    2    ...

我需要计算!is.na(for column names that contains "fu_location") 的数量。我试过mutate(n_fu = sum(!is.na(contains("fu_location")))),但没用。

最好,解决方案在dplyr。也许是一个函数?

预期输出:

> head(p)
  fu_location fu_location_v2 fu_location_v2_v3 fu_location_v2_v3_v4    n_fu
1           1              1                 1                    1       8
2           2              2                 1                    2      20
3           1              1                 1                    2       4
4           2              2                 2                    2       4

  

数据

p <- structure(list(fu_location = c(1L, 2L, 1L, 2L), fu_location_v2 = c(1L, 
2L, 1L, 2L), fu_location_v2_v3 = c(1L, 1L, 1L, 2L), fu_location_v2_v3_v4 = c(1L, 
2L, 2L, 2L), fu_location_v2_v3_v4_v5 = c(2L, 2L, NA, NA), fu_location_v2_v3_v4_v5_v6 = c(1L, 
2L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7 = c(2L, 1L, NA, NA
), fu_location_v2_v3_v4_v5_v6_v7_v8 = c(1L, 2L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9 = c(NA, 
2L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10 = c(NA, 
1L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11 = c(NA, 
2L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12 = c(NA, 
1L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13 = c(NA, 
2L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14 = c(NA, 
2L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15 = c(NA, 
1L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16 = c(NA, 
2L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17 = c(NA, 
1L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17_v18 = c(NA, 
2L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17_v18_v19 = c(NA, 
1L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17_v18_v19_v20 = c(NA, 
2L, NA, NA)), row.names = c(NA, -4L), class = "data.frame")

【问题讨论】:

    标签: r function dataframe dplyr


    【解决方案1】:

    使用rowSums

    library(dplyr)
    p %>% mutate(n_fu =  rowSums(!is.na(select(., contains('fu_location')))))
    

    或者在基地:

    p$n_fu <- rowSums(!is.na(p[grep('fu_location', names(p))]))
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-01-26
      • 2017-08-05
      • 1970-01-01
      • 1970-01-01
      • 2014-12-20
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多