R中具有条件的不同变量子集的平均值答案

【问题标题】：Average of different subsets of variables with condition in RR中具有条件的不同变量子集的平均值
【发布时间】：2021-04-19 23:59:21
【问题描述】：

样本数据：

ID 月1 月2 月3 月4 月5 月6 月7 月8 月9 月10 b1 b2 -------------------------------------------------- -------------------------------------------------- --- 1 12 14 15 45 12 12 11 12 78 28 3 9 2 14 15 45 14 15 45 14 19 22 27 4 8 3 14 13 25 74 25 45 14 19 22 27 5 10 . . . . 70…………………………1 8

我想根据 b1（interview1 月）和 b2（面试2个月）。所以平均值将是逐行的

例如，对于 ID=1，他在 第 3 个月第一次采访，然后在 第 9 个月再次采访，平均值将为 (month3 + month4 + month5 +月 6 + 月 7 + 月 8 月 9)/7，即 (15 + 45 + 12 + 12 + 11 + 12 + 78)/7=26.42

和

对于 ID= 2，平均值为 (month4 + month5 +month6+ month7 +month8)/5

等等..

我正在研究 R-studio。所以，我更喜欢用那个写的代码。提前致谢！！

【问题讨论】：

样本数据：df <- data.frame(ID = c("1","2","3"), month1 = c("12","14","14"), month2 = c("14","15","13"), month3 = c("15","45","25"), month4 = c("45","14","74"), month5 = c("12","15","25"), month6 = c("12","45","45"), month7 = c("11","14","14"), month8 = c("12","19","19"), month9 = c("78","22","22"), month10 = c("28","27","27"), b1 = c("3","4","5"), b2 = c("9","8","10"))

标签： r subset average

【解决方案1】：

只要变量的顺序不变，此解决方案就可以工作。

library(dplyr)

df %>%
  rowwise() %>%
  mutate(avg = mean(c_across((b1+1):(b2+1)), na.rm =TRUE)) %>%
  select(-ID)

# Rowwise: 
  month1 month2 month3 month4 month5 month6 month7 month8 month9 month10    b1    b2   avg
   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl>
1     12     14     15     45     12     12     11     12     78      28     3     9  26.4
2     14     15     45     14     15     45     14     19     22      27     4     8  24.9
3     14     13     25     74     25     45     14     19     22      27     5    10  32

样本数据：

df <- tribble(
  ~ID,  ~month1,  ~month2,   ~month3,   ~month4,   ~month5,  ~month6,  ~month7,  ~month8,  ~month9,  ~month10,   ~b1,  ~b2,
    1,   12,      14,        15,         45,      12,      12,       11,    12,       78,     28,      3,   9,
  2,   14,      15,        45,         14,      15,      45,       14,    19,       22,     27,      4,   8,
  3,   14,      13,        25,         74,      25,      45,       14,    19,       22,     27,      5,   10,
)

【讨论】：

你可以省略select(-ID)。

【解决方案2】：

使用mapply 的基本 R 选项：

cols <- grep('month', names(df), value = TRUE)
df$result <- mapply(function(x, y, z) mean(unlist(df[x,cols[y:z]]),na.rm = TRUE),
                     seq(nrow(df)), df$b1, df$b2)

【讨论】：

【解决方案3】：

您可以使用apply 逐行，对向量进行子集化并计算平均值：

apply(df[-1], 1, function(x) mean(as.numeric(x[x[11]:x[12]])))
#[1] 26.42857 21.40000 25.33333

【讨论】：