使用平均值时忽略值答案

【问题标题】：Ignoring values when using apply with mean使用平均值时忽略值
【发布时间】：2021-05-01 18:14:48
【问题描述】：

我想在用 apply(x,1,mean) 计算平均值时排除值：

#Example data:
df <- data.frame(A1 = c(0,1,2,3,4,2,NA,5,6), 
                 A2 = c(5,0,0,4,NA,5,3,2,1), 
                 A3 = c(0,0,1,2,4,5,3,4,3), 
                 B1 = c(9,9,9,9,9,9,9,9,9))

#I am using grep, because I need to use specific parts of the column names and I can not use the index
df$MEANA <- apply(df[,grep("A", colnames(df))],1,mean, na.rm = TRUE)

这给了我平均值，忽略 NA 值

df$MEANA
[1] 1.6666667 0.3333333 1.0000000 3.0000000 4.0000000 4.0000000 3.0000000 3.6666667 3.3333333

我也想忽略 0。我可以通过将 0 更改为 NA 来做到这一点。或者在读入数据时将 0s 设置为 NA。

我的问题：我可以像使用 na.rm = TRUE 一样忽略 apply 命令中的 0 吗？（例如ignore.value = 0）我是apply概念的新手，不知道是否可行。

更新：

aind <- grep("A", names(df))
#ignore NAs
df$M1 <- apply(df[,grep("A", colnames(df))],1,mean, na.rm = TRUE)
#Ignore 0 and 6 and NA
df$M2 <-rowMeans(sapply(df[aind], function(x) replace(x, x %in% c(0, 6), NA)), na.rm = TRUE)
#ignore 0 and NA
df$M3 <- rowMeans(replace(df[aind], df[aind] == 0, NA), na.rm = TRUE)

df

    > A1 A2 A3 B1        M1       M2       M3
1  0  5  0  9 1.6666667 5.000000 5.000000
2  1  0  0  9 0.3333333 1.000000 1.000000
3  2  0  1  9 1.0000000 1.500000 1.500000
4  3  4  2  9 3.0000000 3.000000 3.000000
5  4 NA  4  9 4.0000000 4.000000 4.000000
6  2  5  5  9 4.0000000 4.000000 4.000000
7 NA  3  5  9 4.0000000 4.000000 4.000000
8  5  2  4  9 3.6666667 3.666667 3.666667
9  6  0  4  9 3.3333333 4.000000 5.000000
>

【问题讨论】：

标签： r apply mean

【解决方案1】：

使用 lamdba 函数会更容易

aind <- grep("A", names(df))
apply(df[aind], 1, function(x) mean(x[x !=0], na.rm = TRUE))

或者在replace之后使用矢量化的rowMeans 将0 转换为NA，这样会更快

rowMeans(replace(df[aind], df[aind] == 0, NA), na.rm = TRUE)

如果我们需要设置多个值，请在vector 上使用%in% 而不是==，因为== 是逐元素比较，这可能会循环使用创建错误结果的值

rowMeans(sapply(df[aind], function(x) 
       replace(x, x %in% c(0, 6), NA)), na.rm = TRUE)

【讨论】：

谢谢，我将使用 rowMeans 解决方案，因为它不会更改原始数据帧，而且它看起来比 apply 命令更好且易于理解。
@BanffBoss122 replace 不会改变原始对象
@BanffBoss122 你现在可以试试吗
完美！谢谢，我会仔细研究其中的差异，以了解它是如何工作的！
@BanffBoss122 问题是%in% 需要向量，但我们提供了一个data.frame，而== 可以在data.frame 上工作，因为它是elementwise

【解决方案2】：

试试这个：

df[df == 0] <- NA

然后是你的代码：

df$MEANA <- apply(df[,grep("A", colnames(df))],1,mean, na.rm = TRUE)

【讨论】：

我特别想避免更改值，所以我将使用 akrun 发布的解决方案 ;)

【解决方案3】：

library(dplyr)

df %>% 
  rowwise() %>% 
  mutate(MeanA = c_across(starts_with('A')) %>% na_if(0) %>% mean(na.rm = TRUE))

# # A tibble: 9 x 5
# # Rowwise: 
#      A1    A2    A3    B1 MeanA
#   <dbl> <dbl> <dbl> <dbl> <dbl>
# 1     0     5     0     9  5   
# 2     1     0     0     9  1   
# 3     2     0     1     9  1.5 
# 4     3     4     2     9  3   
# 5     4    NA     4     9  4   
# 6     2     5     5     9  4   
# 7    NA     3     3     9  3   
# 8     5     2     4     9  3.67
# 9     6     1     3     9  3.33

【讨论】：