【发布时间】:2021-03-15 11:57:56
【问题描述】:
我有一个大型数据集 (907 x 1855)。我需要计算每位患者接受了多少次随访。后续列包含1、2 或NA,后续可以定义为特定列是!is.na()。
最多有 20 次跟进。如您所见,每个跟进都添加了_vX作为后缀,其中x对应于跟进的数量。
因此,follow-up nr 20 具有非常不方便的 RedCapautogenerated 列名称 p$fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17_v18_v19_v20
> head(p)
fu_location fu_location_v2 fu_location_v2_v3 fu_location_v2_v3_v4 ...
1 1 1 1 1 ...
2 2 2 1 2 ...
3 1 1 1 2 ...
4 2 2 2 2 ...
我需要计算!is.na(for column names that contains "fu_location") 的数量。我试过mutate(n_fu = sum(!is.na(contains("fu_location")))),但没用。
最好,解决方案在dplyr。也许是一个函数?
预期输出:
> head(p)
fu_location fu_location_v2 fu_location_v2_v3 fu_location_v2_v3_v4 n_fu
1 1 1 1 1 8
2 2 2 1 2 20
3 1 1 1 2 4
4 2 2 2 2 4
数据
p <- structure(list(fu_location = c(1L, 2L, 1L, 2L), fu_location_v2 = c(1L,
2L, 1L, 2L), fu_location_v2_v3 = c(1L, 1L, 1L, 2L), fu_location_v2_v3_v4 = c(1L,
2L, 2L, 2L), fu_location_v2_v3_v4_v5 = c(2L, 2L, NA, NA), fu_location_v2_v3_v4_v5_v6 = c(1L,
2L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7 = c(2L, 1L, NA, NA
), fu_location_v2_v3_v4_v5_v6_v7_v8 = c(1L, 2L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9 = c(NA,
2L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10 = c(NA,
1L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11 = c(NA,
2L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12 = c(NA,
1L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13 = c(NA,
2L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14 = c(NA,
2L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15 = c(NA,
1L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16 = c(NA,
2L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17 = c(NA,
1L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17_v18 = c(NA,
2L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17_v18_v19 = c(NA,
1L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17_v18_v19_v20 = c(NA,
2L, NA, NA)), row.names = c(NA, -4L), class = "data.frame")
【问题讨论】:
标签: r function dataframe dplyr