【发布时间】:2021-07-15 14:22:59
【问题描述】:
我想通过忽略数据帧列表中每个数据帧的第一列的列来计算中值绝对偏差 (mscore)。然后将结果作为新行添加到行名为 mscore 的数据框中。
以前我会一次一个地对每个数据帧进行计算,但现在它简化了流程。
下面是我的数据框列表的一小部分。 dfs 的完整列表有超过 30 个数据帧
list(Al2O3 = structure(list(Determination_No = 1:6, `2` = c(2.01,
2.02, 2.03, 2.01, 2.02, 2), `3` = c(2.01, 2.01, 2, 2.02, 2.02,
2.03), `4` = c(2, 2.03, 1.99, 2.01, 2.01, 2.01), `5` = c(2.02,
2.02, 2.05, 2.03, 2.02, 2.03), `7` = c(1.88, 1.9, 1.89, 1.88,
1.88, 1.87), `8` = c(2.053, 2.044, 2.041, 2.038, 2.008, 2.02),
`10` = c(2.002830415, 2.021725042, 2.021725042, 1.983935789,
2.002830415, 2.021725042), `12` = c(2.09, 2.05, 1.96, 2.09,
2.06, 2.02)), class = "data.frame", row.names = c(NA, -6L
)), As = structure(list(Determination_No = 1:6, `2` = c(0.052,
0.027, 0.011, 0.011, 0.012, 0.012), `3` = c(0.012, 0.012, 0.013,
0.012, 0.013, 0.013), `4` = c(0.012, 0.012, 0.013, 0.012, 0.012,
0.012), `5` = c(0.013, 0.013, 0.013, 0.013, 0.013, 0.013), `7` = c(0.011,
0.011, 0.011, 0.012, 0.011, 0.011), `8` = c(0.011, 0.01, 0.011,
0.011, 0.011, 0.011), `10` = c(0.01, 0.01, 0.01, 0.01, 0.01,
0.01), `12` = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_)), class = "data.frame", row.names = c(NA, -6L)), Fe = structure(list(
Determination_No = 1:6, `2` = c(55.94, 55.7, 56.59, 56.5,
55.98, 55.93), `3` = c(56.83, 56.54, 56.18, 56.5, 56.51,
56.34), `4` = c(56.39, 56.43, 56.53, 56.31, 56.47, 56.35),
`5` = c(56.32, 56.29, 56.31, 56.32, 56.39, 56.32), `7` = c(56.48,
56.4, 56.54, 56.43, 56.73, 56.62), `8` = c(56.382, 56.258,
56.442, 56.258, 56.532, 56.264), `10` = c(56.3, 56.5, 56.2,
56.5, 56.7, 56.5), `12` = c(56.11, 56.46, 56.1, 56.35, 56.36,
56.37)), class = "data.frame", row.names = c(NA, -6L)))
以前我会做以下事情
#create a modified scores function to accept NAs
scores_na <- function(x, ...) {
not_na <- !is.na(x)
scores <- rep(NA, length(x))
scores[not_na] <- outliers::scores(na.omit(x), ...)
scores
}
MscoreMax <- 3.0 # the the threshold to remove values deemed to be an outlier
colmedians <- median, df[-1], na.rm = T)
MScore <- as.vector(round(abs(scores_na(colmedians, "mad")), digits = 2)) #Mscore to 2 decimals
places
MscoreIndex <- which(MScore > MscoreMax) #get the index of each value exceeding the threshold
df[-1][Fe.MscoreIndex] <- NA # change outliers to NA so they are excluded from further calculations
我已经尝试了下面的行来计算中位数
colmedians 函数用于矩阵,所以我使用 mapply 跨列应用
df <- lapply(df, function(x) rbind(x[,-1],
mapply(median(x[,-1],na.rm = TRUE))))
但是我得到了跟随错误
Error in median.default(x[, -1], na.rm = TRUE) : need numeric data
当我查询数据帧时,我的值被存储为 double 所以有点卡住了。
【问题讨论】:
-
仅供参考,代码块由代码围栏分隔,代码围栏是三个 反引号 (
```),而不是这里的单引号;见stackoverflow.com/editing-help。 -
@r2evans 道歉
-
无需道歉!我只是在问题格式方面提供建议。如果它太咄咄逼人,我很抱歉,我可以回滚。
-
@r2evans 不,你不是。反馈和编辑很有用。我正在寻求帮助,所以对每个人来说越容易越好
-
mapply的使用在这里是错误的,但这很容易解决(第一个参数需要是一个函数,而不是函数调用的结果,但我不相信你需要mapply,lapply应该可以工作)。但是...绑定一行并添加行名是一回事,但行名很容易丢失。我通常不喜欢将汇总统计数据作为 row 添加到实际数据中,是否只是为了在报告中呈现/呈现而绑定该行?