【问题标题】:Median of a subset of columns where value of other columns is 1其他列值为 1 的列子集的中位数
【发布时间】:2014-09-16 14:36:17
【问题描述】:

this one 非常相似的问题,但是有一些根本的区别。

我有一个时间戳、4 个测量列和 4 个状态列的数据集:

structure(list(Timestamp = structure(c(1409544002, 1409544006, 
1409544010, 1409544014, 1409544018, 1409544022), class = c("POSIXct", 
"POSIXt"), tzone = ""), A = c(0, 0, 0, 0, 0, 0), B = c(20.77579, 
21.05727, 21.81632, 21.36299, 21.18629, 21.34721), C = c(16.25537, 
16.45496, 16.70933, 16.1526, 16.60963, 16.76558), D = c(0, 0, 
0, 0, 0, 0), SA = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", 
"0"), class = "factor"), SB = structure(c(1L, 1L, 1L, 1L, 1L, 
1L), .Label = c("1", "0"), class = "factor"), SC = structure(c(1L, 
1L, 1L, 1L, 1L, 1L), .Label = c("1", "0"), class = "factor"), 
SD = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", 
"0"), class = "factor")), .Names = c("Timestamp", "A", "B", 
"C", "D", "SA", "SB", "SC", "SD"), row.names = c(NA, 6L), class = "data.frame")

我想计算打开的列的中位数,如 S* 列中的 1 所示。

到目前为止,我可以使用以下方法逐行查找要使用的测量列:

foo[i, c(which(x = foo[i, 6:9] == 1, arr.ind = FALSE) + 1)]

其中i 是行号。

在我的代码没有变得太复杂的情况下,这是我所能得到的。我在想我可以通过将我用上面的代码行得到的列(在逐行for 循环之后)绑定到时间戳来创建一个新的数据框,用 NA 填充空白点,计算该数据的中位数帧,最后将中位数绑定到原始数​​据帧。但必须有更好的方法!

有什么想法吗?

编辑:

输出应如下所示:

structure(list(Timestamp = structure(c(1409544002, 1409544006, 
1409544010, 1409544014, 1409544018, 1409544022), class = c("POSIXct", 
"POSIXt"), tzone = ""), A = c(0, 0, 0, 0, 0, 0), B = c(20.77579, 
21.05727, 21.81632, 21.36299, 21.18629, 21.34721), C = c(16.25537, 
16.45496, 16.70933, 16.1526, 16.60963, 16.76558), D = c(0, 0, 
0, 0, 0, 0), SA = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", 
"0"), class = "factor"), SB = structure(c(1L, 1L, 1L, 1L, 1L, 
1L), .Label = c("1", "0"), class = "factor"), SC = structure(c(1L, 
1L, 1L, 1L, 1L, 1L), .Label = c("1", "0"), class = "factor"), 
SD = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1", 
"0"), class = "factor"), Median = c(18.51558, 18.756115, 
19.262825, 18.757795, 18.89796, 19.056395)), .Names = c("Timestamp", 
"A", "B", "C", "D", "SA", "SB", "SC", "SD", "Median"), row.names = c(NA, 
6L), class = "data.frame")

【问题讨论】:

  • 您对示例数据的预期结果是什么?
  • @Thomas 我已经添加了结果数据框。
  • 听起来像是在乞求plyr 包的工作。

标签: r


【解决方案1】:

这有点混乱,因为您的 S* 列是因素。如果将它们转换为数字或逻辑,则可以跳过下面第二行代码的大部分内容:

w <- grepl("^S", names(foo))
m <- matrix(as.logical(as.numeric(as.matrix(foo[, w]))), ncol = sum(w))
foo$Median <- apply(`[<-`(as.matrix(foo[,LETTERS[1:4]]), !m, NA), 1, median, na.rm=TRUE)
foo
#             Timestamp A        B        C D SA SB SC SD   Median
# 1 2014-09-01 06:00:02 0 20.77579 16.25537 0  0  1  1  0 18.51558
# 2 2014-09-01 06:00:06 0 21.05727 16.45496 0  0  1  1  0 18.75612
# 3 2014-09-01 06:00:10 0 21.81632 16.70933 0  0  1  1  0 19.26282
# 4 2014-09-01 06:00:14 0 21.36299 16.15260 0  0  1  1  0 18.75780
# 5 2014-09-01 06:00:18 0 21.18629 16.60963 0  0  1  1  0 18.89796
# 6 2014-09-01 06:00:22 0 21.34721 16.76558 0  0  1  1  0 19.05640

【讨论】:

  • 太棒了!您能解释一下或将我指向解释[&lt;- 的资源吗?我以前从未见过。
  • @amzu [&lt;- 是分配给左侧的子集(与 a[b] &lt;- c 类似,但写为 [&lt;-(a,b,c) 并且不会更改原始 a 对象。我使用这种结构是为了将两行代码减少到一行。
猜你喜欢
  • 2019-02-03
  • 2023-01-19
  • 2020-08-11
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2023-03-26
相关资源
最近更新 更多