【问题标题】:Conditional Insertion of rows based on consecutive values in a column in R基于R中列中的连续值有条件地插入行
【发布时间】:2018-08-04 10:14:26
【问题描述】:

我有一个数据框,如果列中的值从“A”变为“B”,我需要在其中插入两行之间的行。

Event   Price   Type    Date    Time

A       100      Sell   27-01-2018 12:00
C       200      Buy    27-01-2018 12:15
C       300      Buy    27-01-2018 12:30
D       350      Sell   27-01-2018 12:31
A       320      Buy    27-01-2018 12:32
B       321      Sell   27-01-2018 12:32
B       220      Buy    27-01-2018 12:34
L       550      Buy    27-01-2018 12:35
A       320      Buy    27-01-2018 12:32
B       320      Sell   27-01-2018 12:32

如果事件“B”在事件“A”之后,我想插入一个新行。新行需要插入到所有值等于“B”是事件的行的两行之间,除了事件将是“Z”。

预期的数据框

Event   Price   Type    Date    Time

A       100      Sell   27-01-2018 12:00
C       200      Buy    27-01-2018 12:15
C       300      Buy    27-01-2018 12:30
D       350      Sell   27-01-2018 12:31
A       320      Buy    27-01-2018 12:32
Z       321      Sell   27-01-2018 12:32
B       321      Sell   27-01-2018 12:32
B       220      Buy    27-01-2018 12:34
L       550      Buy    27-01-2018 12:35
A       320      Buy    27-01-2018 12:32
Z       320      Sell   27-01-2018 12:32
B       320      Sell   27-01-2018 12:32

【问题讨论】:

    标签: r dataframe dplyr


    【解决方案1】:

    替代tidyverse 方法

    library(tidyverse)
    df %>%
      group_by(G = cumsum(Event == "B" & dplyr::lag(Event, 1, default=NA) == "A")) %>%
      do(rbind(mutate(head(., 1), Event = "Z"), .)) %>%
      ungroup() %>%
      slice(-1) %>%
      select(-G)
    
    # A tibble: 12 x 5
       # Event Price Type  Date       Time 
       # <chr> <int> <chr> <chr>      <chr>
     # 1 A       100 Sell  27-01-2018 12:00
     # 2 C       200 Buy   27-01-2018 12:15
     # 3 C       300 Buy   27-01-2018 12:30
     # 4 D       350 Sell  27-01-2018 12:31
     # 5 A       320 Buy   27-01-2018 12:32
     # 6 Z       321 Sell  27-01-2018 12:32
     # 7 B       321 Sell  27-01-2018 12:32
     # 8 B       220 Buy   27-01-2018 12:34
     # 9 L       550 Buy   27-01-2018 12:35
    # 10 A       320 Buy   27-01-2018 12:32
    # 11 Z       320 Sell  27-01-2018 12:32
    # 12 B       320 Sell  27-01-2018 12:32
    

    数据

    df <- read.table(text="Event   Price   Type    Date    Time
    A       100      Sell   27-01-2018 12:00
    C       200      Buy    27-01-2018 12:15
    C       300      Buy    27-01-2018 12:30
    D       350      Sell   27-01-2018 12:31
    A       320      Buy    27-01-2018 12:32
    B       321      Sell   27-01-2018 12:32
    B       220      Buy    27-01-2018 12:34
    L       550      Buy    27-01-2018 12:35
    A       320      Buy    27-01-2018 12:32
    B       320      Sell   27-01-2018 12:32", header=TRUE, stringsAsFactors=FALSE)
    

    【讨论】:

      【解决方案2】:

      这是一个使用base R 的选项。我们通过将下一个“事件”与当前“事件”进行比较来创建一个逻辑vector,并检查它是否等于“A”和“B”。然后,使用索引对数据集进行子集化,rbind 使用原始数据集,然后根据索引“i2”将“事件”更改为“Z”

      i1 <- with(df1, c(FALSE, Event[-1] == "B" & Event[-nrow(df1)] == "A"))
      i2 <- which(i1) + seq_along(which(i1))-1
      n <- sum(i1)+ length(i1)
      res <- rbind(df1, transform(df1[i1,], Event = "Z"))[order(c(setdiff(seq_len(n), i2), i2)),]
      row.names(res) <- NULL
      res
      #   Event Price Type       Date  Time
      #1      A   100 Sell 27-01-2018 12:00
      #2      C   200  Buy 27-01-2018 12:15
      #3      C   300  Buy 27-01-2018 12:30
      #4      D   350 Sell 27-01-2018 12:31
      #5      A   320  Buy 27-01-2018 12:32
      #6      Z   321 Sell 27-01-2018 12:32
      #7      B   321 Sell 27-01-2018 12:32
      #8      B   220  Buy 27-01-2018 12:34
      #9      L   550  Buy 27-01-2018 12:35
      #10     A   320  Buy 27-01-2018 12:32
      #11     Z   320 Sell 27-01-2018 12:32
      #12     B   320 Sell 27-01-2018 12:32
      

      【讨论】:

      • 如果我没记错的话,这种方法会导致“Z”观察值位于数据框的末尾?
      【解决方案3】:

      这是一种使用 tidyverse 的方法:

      library(tidyverse)
      df %>%
        mutate(lagE = lag(Event),  #create a lag Even column
               splt = ifelse(Event == "B" & lagE == "A", T, F),  #label the condition B after A
               cum = cumsum(splt)) %>% #create a column to split by
        {split(., .$cum)} %>% #split the data frame
        map(function(x){  #in each list data frame check if first element is B, if it is duplicate it and rename to Z, if not just return the data frame.
          if(x[1,1] == "B"){
            z <- rbind(x[1,], x)
            z[,1] <- as.character(z[,1])
            z[1,1] <- "Z" 
          } else {z <- x}
          z
        }) %>%
        bind_rows() %>% #put back to a data frame
        select(1:5) #remove helper columns
      
      #output
         Event Price Type       Date  Time
      1      A   100 Sell 27-01-2018 12:00
      2      C   200  Buy 27-01-2018 12:15
      3      C   300  Buy 27-01-2018 12:30
      4      D   350 Sell 27-01-2018 12:31
      5      A   320  Buy 27-01-2018 12:32
      6      Z   321 Sell 27-01-2018 12:32
      7      B   321 Sell 27-01-2018 12:32
      8      B   220  Buy 27-01-2018 12:34
      9      L   550  Buy 27-01-2018 12:35
      10     A   320  Buy 27-01-2018 12:32
      11     Z   320 Sell 27-01-2018 12:32
      12     B   320 Sell 27-01-2018 12:32
      

      这个问题看起来很简单,我相信有人会提供更简洁的解决方案。

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2021-08-13
        • 1970-01-01
        • 1970-01-01
        • 2018-11-17
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多