【问题标题】:Conditional slicing|filtering top and bottom n rows from grouped data条件切片|从分组数据中过滤顶部和底部 n 行
【发布时间】:2018-05-04 07:04:14
【问题描述】:

我遇到了一个问题,即从分组数据中同时过滤或切片顶部和底部 n 行。

所以和这个Select first and last row from grouped data不一样

如果sub_gr==a then filter|slice top 行,我需要做什么 if sub_gr==b then filter|slice bottom two rows that's it!

我的数据是这样的

  df <- data.frame(gr=rep(seq(1,2),each=10),sub_gr=rep(rep(c("a","b"),each=5),2),
                   y = rep(c(sort(runif(5,0,0.5),decreasing=TRUE), sort(runif(5,0,0.5),,decreasing=TRUE)),2),
                   x = rep(c(seq(0.1,0.5,0.1),rev(seq(-0.5,-0.1,0.1))),2))

gr sub_gr          y      x
1   1      a 0.37851909  0.1
2   1      a 0.33305165  0.2
3   1      a 0.22478005  0.3
4   1      a 0.09677654  0.4
5   1      a 0.07060651  0.5
6   1      b 0.41999445 -0.1
7   1      b 0.35356301 -0.2
8   1      b 0.33274398 -0.3
9   1      b 0.20451400 -0.4
10  1      b 0.03714828 -0.5
11  2      a 0.37851909  0.1
12  2      a 0.33305165  0.2
13  2      a 0.22478005  0.3
14  2      a 0.09677654  0.4
15  2      a 0.07060651  0.5
16  2      b 0.41999445 -0.1
17  2      b 0.35356301 -0.2
18  2      b 0.33274398 -0.3
19  2      b 0.20451400 -0.4
20  2      b 0.03714828 -0.5

库(dplyr)

这是我尝试过的,

 df%>%
    group_by(gr, sub_gr)%>%
    slice(if(any(sub_gr=="a")) {row_number()==1:3} else {row_number()==4:n()})

Warning messages:
1: In 1:5 == 1:3 :
  longer object length is not a multiple of shorter object length
2: In 1:5 == 4:5L :
  longer object length is not a multiple of shorter object length
3: In 1:5 == 1:3 :
  longer object length is not a multiple of shorter object length
4: In 1:5 == 4:5L :
  longer object length is not a multiple of shorter object length

提前感谢您的帮助!

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    可能有更优雅的解决方案,但我认为以下方法可行。我为可重复性设置了种子。

    set.seed(123)
    
    df <- data.frame(gr=rep(seq(1,2),each=10),sub_gr=rep(rep(c("a","b"),each=5),2),
                     y = rep(c(sort(runif(5,0,0.5),decreasing=TRUE), sort(runif(5,0,0.5),,decreasing=TRUE)),2),
                     x = rep(c(seq(0.1,0.5,0.1),rev(seq(-0.5,-0.1,0.1))),2))
    
    df %>%
      group_by(gr, sub_gr) %>%
      filter((sub_gr %in% "a" & row_number() %in% 1:3) |
               (sub_gr %in% "b" & row_number() %in% (n() - 1):n())) %>%
      ungroup()
    
    #  # A tibble: 10 x 4
    #       gr sub_gr          y     x
    #    <int> <fctr>      <dbl> <dbl>
    #  1     1      a 0.47023364   0.1
    #  2     1      a 0.44150870   0.2
    #  3     1      a 0.39415257   0.3
    #  4     1      b 0.22830737  -0.4
    #  5     1      b 0.02277825  -0.5
    #  6     2      a 0.47023364   0.1
    #  7     2      a 0.44150870   0.2
    #  8     2      a 0.39415257   0.3
    #  9     2      b 0.22830737  -0.4
    # 10     2      b 0.02277825  -0.5
    

    【讨论】:

      【解决方案2】:
      library(tidyverse)
      # create a custom function to take the head or tail based on your rule
      cond_slice <- function(x) {
        if (unique(x$sub_gr) == "a") {
          head(x, 3)
        } else {
          tail(x, 2)
        }
      }
      # create a column to split by and then map across the subsets
      result <- x %>%
        unite(split_by, gr, sub_gr, remove = F) %>%
        split(.$split_by) %>%
        map(cond_slice) %>% 
        bind_rows() %>%
        select(-split_by)
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2018-05-16
        • 1970-01-01
        • 2018-11-09
        • 1970-01-01
        • 1970-01-01
        • 2018-04-27
        • 2017-04-13
        相关资源
        最近更新 更多