R：函数或类似函数，用于汇总大型数据集中包含特定字符的列的非 NA 值的数量 [重复]答案

【问题标题】：R: function or similar to sum up number of non-NA values for columns that contain specific characters in large data set [duplicate]R：函数或类似函数，用于汇总大型数据集中包含特定字符的列的非 NA 值的数量 [重复]
【发布时间】：2021-03-15 11:57:56
【问题描述】：

我有一个大型数据集 (907 x 1855)。我需要计算每位患者接受了多少次随访。后续列包含1、2 或NA，后续可以定义为特定列是!is.na()。

最多有 20 次跟进。如您所见，每个跟进都添加了_vX作为后缀，其中x对应于跟进的数量。

因此，follow-up nr 20 具有非常不方便的 RedCapautogenerated 列名称 p$fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17_v18_v19_v20

> head(p)
  fu_location fu_location_v2 fu_location_v2_v3 fu_location_v2_v3_v4    ...
1           1              1                 1                    1    ...
2           2              2                 1                    2    ...
3           1              1                 1                    2    ...
4           2              2                 2                    2    ...

我需要计算!is.na(for column names that contains "fu_location") 的数量。我试过mutate(n_fu = sum(!is.na(contains("fu_location"))))，但没用。

最好，解决方案在dplyr。也许是一个函数？

预期输出：

> head(p)
  fu_location fu_location_v2 fu_location_v2_v3 fu_location_v2_v3_v4    n_fu
1           1              1                 1                    1       8
2           2              2                 1                    2      20
3           1              1                 1                    2       4
4           2              2                 2                    2       4

数据

p <- structure(list(fu_location = c(1L, 2L, 1L, 2L), fu_location_v2 = c(1L, 
2L, 1L, 2L), fu_location_v2_v3 = c(1L, 1L, 1L, 2L), fu_location_v2_v3_v4 = c(1L, 
2L, 2L, 2L), fu_location_v2_v3_v4_v5 = c(2L, 2L, NA, NA), fu_location_v2_v3_v4_v5_v6 = c(1L, 
2L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7 = c(2L, 1L, NA, NA
), fu_location_v2_v3_v4_v5_v6_v7_v8 = c(1L, 2L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9 = c(NA, 
2L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10 = c(NA, 
1L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11 = c(NA, 
2L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12 = c(NA, 
1L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13 = c(NA, 
2L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14 = c(NA, 
2L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15 = c(NA, 
1L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16 = c(NA, 
2L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17 = c(NA, 
1L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17_v18 = c(NA, 
2L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17_v18_v19 = c(NA, 
1L, NA, NA), fu_location_v2_v3_v4_v5_v6_v7_v8_v9_v10_v11_v12_v13_v14_v15_v16_v17_v18_v19_v20 = c(NA, 
2L, NA, NA)), row.names = c(NA, -4L), class = "data.frame")

【问题讨论】：

标签： r function dataframe dplyr

【解决方案1】：

使用rowSums：

library(dplyr)
p %>% mutate(n_fu =  rowSums(!is.na(select(., contains('fu_location')))))

或者在基地：

p$n_fu <- rowSums(!is.na(p[grep('fu_location', names(p))]))

【讨论】：