基于R中列中的连续值有条件地插入行答案

【问题标题】：Conditional Insertion of rows based on consecutive values in a column in R基于R中列中的连续值有条件地插入行
【发布时间】：2018-08-04 10:14:26
【问题描述】：

我有一个数据框，如果列中的值从“A”变为“B”，我需要在其中插入两行之间的行。

Event   Price   Type    Date    Time

A       100      Sell   27-01-2018 12:00
C       200      Buy    27-01-2018 12:15
C       300      Buy    27-01-2018 12:30
D       350      Sell   27-01-2018 12:31
A       320      Buy    27-01-2018 12:32
B       321      Sell   27-01-2018 12:32
B       220      Buy    27-01-2018 12:34
L       550      Buy    27-01-2018 12:35
A       320      Buy    27-01-2018 12:32
B       320      Sell   27-01-2018 12:32

如果事件“B”在事件“A”之后，我想插入一个新行。新行需要插入到所有值等于“B”是事件的行的两行之间，除了事件将是“Z”。

预期的数据框

Event   Price   Type    Date    Time

A       100      Sell   27-01-2018 12:00
C       200      Buy    27-01-2018 12:15
C       300      Buy    27-01-2018 12:30
D       350      Sell   27-01-2018 12:31
A       320      Buy    27-01-2018 12:32
Z       321      Sell   27-01-2018 12:32
B       321      Sell   27-01-2018 12:32
B       220      Buy    27-01-2018 12:34
L       550      Buy    27-01-2018 12:35
A       320      Buy    27-01-2018 12:32
Z       320      Sell   27-01-2018 12:32
B       320      Sell   27-01-2018 12:32

【问题讨论】：

标签： r dataframe dplyr

【解决方案1】：

替代tidyverse 方法

library(tidyverse)
df %>%
  group_by(G = cumsum(Event == "B" & dplyr::lag(Event, 1, default=NA) == "A")) %>%
  do(rbind(mutate(head(., 1), Event = "Z"), .)) %>%
  ungroup() %>%
  slice(-1) %>%
  select(-G)

# A tibble: 12 x 5
   # Event Price Type  Date       Time 
   # <chr> <int> <chr> <chr>      <chr>
 # 1 A       100 Sell  27-01-2018 12:00
 # 2 C       200 Buy   27-01-2018 12:15
 # 3 C       300 Buy   27-01-2018 12:30
 # 4 D       350 Sell  27-01-2018 12:31
 # 5 A       320 Buy   27-01-2018 12:32
 # 6 Z       321 Sell  27-01-2018 12:32
 # 7 B       321 Sell  27-01-2018 12:32
 # 8 B       220 Buy   27-01-2018 12:34
 # 9 L       550 Buy   27-01-2018 12:35
# 10 A       320 Buy   27-01-2018 12:32
# 11 Z       320 Sell  27-01-2018 12:32
# 12 B       320 Sell  27-01-2018 12:32

数据

df <- read.table(text="Event   Price   Type    Date    Time
A       100      Sell   27-01-2018 12:00
C       200      Buy    27-01-2018 12:15
C       300      Buy    27-01-2018 12:30
D       350      Sell   27-01-2018 12:31
A       320      Buy    27-01-2018 12:32
B       321      Sell   27-01-2018 12:32
B       220      Buy    27-01-2018 12:34
L       550      Buy    27-01-2018 12:35
A       320      Buy    27-01-2018 12:32
B       320      Sell   27-01-2018 12:32", header=TRUE, stringsAsFactors=FALSE)

【讨论】：

【解决方案2】：

这是一个使用base R 的选项。我们通过将下一个“事件”与当前“事件”进行比较来创建一个逻辑vector，并检查它是否等于“A”和“B”。然后，使用索引对数据集进行子集化，rbind 使用原始数据集，然后根据索引“i2”将“事件”更改为“Z”

i1 <- with(df1, c(FALSE, Event[-1] == "B" & Event[-nrow(df1)] == "A"))
i2 <- which(i1) + seq_along(which(i1))-1
n <- sum(i1)+ length(i1)
res <- rbind(df1, transform(df1[i1,], Event = "Z"))[order(c(setdiff(seq_len(n), i2), i2)),]
row.names(res) <- NULL
res
#   Event Price Type       Date  Time
#1      A   100 Sell 27-01-2018 12:00
#2      C   200  Buy 27-01-2018 12:15
#3      C   300  Buy 27-01-2018 12:30
#4      D   350 Sell 27-01-2018 12:31
#5      A   320  Buy 27-01-2018 12:32
#6      Z   321 Sell 27-01-2018 12:32
#7      B   321 Sell 27-01-2018 12:32
#8      B   220  Buy 27-01-2018 12:34
#9      L   550  Buy 27-01-2018 12:35
#10     A   320  Buy 27-01-2018 12:32
#11     Z   320 Sell 27-01-2018 12:32
#12     B   320 Sell 27-01-2018 12:32

【讨论】：

如果我没记错的话，这种方法会导致“Z”观察值位于数据框的末尾？

【解决方案3】：

这是一种使用 tidyverse 的方法：

library(tidyverse)
df %>%
  mutate(lagE = lag(Event),  #create a lag Even column
         splt = ifelse(Event == "B" & lagE == "A", T, F),  #label the condition B after A
         cum = cumsum(splt)) %>% #create a column to split by
  {split(., .$cum)} %>% #split the data frame
  map(function(x){  #in each list data frame check if first element is B, if it is duplicate it and rename to Z, if not just return the data frame.
    if(x[1,1] == "B"){
      z <- rbind(x[1,], x)
      z[,1] <- as.character(z[,1])
      z[1,1] <- "Z" 
    } else {z <- x}
    z
  }) %>%
  bind_rows() %>% #put back to a data frame
  select(1:5) #remove helper columns

#output
   Event Price Type       Date  Time
1      A   100 Sell 27-01-2018 12:00
2      C   200  Buy 27-01-2018 12:15
3      C   300  Buy 27-01-2018 12:30
4      D   350 Sell 27-01-2018 12:31
5      A   320  Buy 27-01-2018 12:32
6      Z   321 Sell 27-01-2018 12:32
7      B   321 Sell 27-01-2018 12:32
8      B   220  Buy 27-01-2018 12:34
9      L   550  Buy 27-01-2018 12:35
10     A   320  Buy 27-01-2018 12:32
11     Z   320 Sell 27-01-2018 12:32
12     B   320 Sell 27-01-2018 12:32

这个问题看起来很简单，我相信有人会提供更简洁的解决方案。

【讨论】：