对行对求平均值并根据条件粘贴值答案

【问题标题】：Average over rows pairs and paste the value based on condition对行对求平均值并根据条件粘贴值
【发布时间】：2020-03-31 13:42:12
【问题描述】：

在 R 中，我有一个 df，例如：

     a      b   c 
 1   124    70  aa     
 2   129    67  aa     
 3   139    71  aa     
 4   125    77  aa     
 5   125    82  aa     
 6   121    69  aa     
 7   135    68  bb
 8   137    72  bb
 9   137    78  bb
10   140    86  bb

我想沿列 (a, b) 中的行进行迭代，计算所有行对的平均值，如果这两行之间的差异为>=12。否则只需复制旧值。此行为应仅限于由另一列 (c) 标记的组，即如果两行来自不同的组，则不应发生这种情况。

在此示例中，它发生在第 3 行（a 列中的 cos，与下（第 4）行的差异为 14）和第 5 行（b 列中的 cos，与下一行的差异为 13）。但是，第 6 行不应该发生这种情况，因为第 7 行在另一个 c 组中。

因此，生成的 df 看起来像：

     a      b   c     a_new  b_new
 1   124    70  aa    124    70
 2   129    67  aa    129    67
 3   139    71  aa    132    71   
 4   125    77  aa    132    68
 5   125    82  aa    125    75.5
 6   121    69  aa    121    75.5
 7   135    68  bb    135    68
 8   137    72  bb    137    72
 9   137    78  bb    137    78
10   140    86  bb    140    86

我一直在努力做到这一点，发现也许可以使用滞后功能，但没有成功。非常感谢您的帮助（无论是 base R 还是 dplyr 或其他）

输入：

structure(list(a = c(124, 129, 139, 125, 125, 121, 135, 137, 
137, 140), b = c(70, 67, 71, 77, 82, 69, 68, 72, 78, 86), c = c("aa", 
"aa", "aa", "aa", "aa", "aa", "bb", "bb", "bb", "bb")), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

【问题讨论】：

我不明白为什么a 中的第四行被替换了，我也没有得到你的确切值，但你的意思是这样的吗？ df %>% group_by(c) %>% mutate_at(vars(a, b), list(new = ~replace(., which(abs(diff(.)) >= 12) + 1, mean(.)))) ?
这似乎可以检测行，但粘贴的值看起来不像是平均值，只粘贴到相应对的第二行？
这里的逻辑我不太清楚。更换必须由组完成，即c?你用什么值替换了哪些行？
我想用它们的平均值替换差异> = 12的两行。因此，例如我应该将第 3 行（值：139）和第 4 行（值：125）的平均值粘贴到 a_new 列的第 3 行和第 4 行（平均值为 132）。这种行为不应发生在属于不同组的行之间（c 列），因此 a_new 列中的第 6 行和第 7 行不会发生任何事情（差异>=12，但第 6 行在组 'aa' 中，第 7 行在组 'bb ')。

标签： r dplyr

【解决方案1】：

我们可以编写一个对一个块起作用的函数。

apply_fun <- function(x) {
    inds <- which(abs(diff(x)) >= 12)
    if(length(inds))
        x[sort(c(inds, inds + 1))] <-  c(sapply(inds, function(i) 
                                          rep(mean(x[c(i, i + 1)]), 2)))
    return(x)
}

然后按组将其应用于多个列。

library(dplyr)
df %>% group_by(c) %>% mutate_at(vars(a, b), list(new = apply_fun))

#      a     b c     a_new b_new
#   <dbl> <dbl> <chr> <dbl> <dbl>
# 1   124    70 aa      124  70  
# 2   129    67 aa      129  67  
# 3   139    71 aa      132  71  
# 4   125    77 aa      132  77  
# 5   125    82 aa      125  75.5
# 6   121    69 aa      121  75.5
# 7   135    68 bb      135  68  
# 8   137    72 bb      137  72  
# 9   137    78 bb      137  78  
#10   140    86 bb      140  86

【讨论】：

【解决方案2】：

我的理解是将下面代码中注释的过程应用于指示列“c”给出的每个组：


pairAverage <- function(x) {
  # x should be a numeric vector of length > 1
  if (is.vector(x) & is.numeric(x) & length(x) > 1) {

    # copy data to an aux vector
    aux <- x

    # get differences of lag 1
    dh<-diff(x, 1)

    # get means of consecutive pairs
    med <- c(x$a[2:length(x)] - dh/2)

    # get positions (index) of abs(means) >= 12  
    idx <- match(med[abs(dh) >= 12], med)

    # need 2 reps of each mean to replace consecutive values of x
    valToRepl <- med[sort(rep(idx,2))]

    # ordered indexes pairs of consecutive elements of x to be replaced  
    idxToRepl <- sort(c(idx,idx+1))

    # replace pairs of values 
    aux[idxToRepl] <- valToRepl

    return(aux)

  } else {
    # do nothing
    warning("paramater x should be a numeric vector of length > 1")
    return(NULL)
  }
}

pairAverageByGroups <- function(x, gr) {
  if (is.vector(x) & is.numeric(x) & length(x) == length(gr)) {
    x.ls <- split(x, as.factor(gr))
    output <- unlist(lapply(x.ls, pairAverage))
    names(output) <- NULL
    output
  } else {
    # do nothing
    warning("paremater x should be a numeric vector of length > 1")
    return(NULL)
  }
}

pairAverageByGroups(dd$a, dd$c)
 [1] 124 129 132 132 125 121 135 137 137 140

【讨论】：

感谢您，但我不断收到错误消息：'FUN 中的错误（X[[i]]，...）：找不到对象'dd'。 dd 似乎在函数中使用，但我的数据框的名称实际上有所不同。我也收到奇怪的提示'Browse[1]>'，不知道为什么？
对不起！我在med <- c(dd$a[2:length(x)] - dh/2) 行中将 dd 替换为 x。现在试试，问题是我写了功能复制和粘贴测试代码。 dd 是我的 data.frame 名称