【问题标题】:copy values of a column into another column based on a condition using a loop使用循环根据条件将列的值复制到另一列
【发布时间】:2014-11-25 12:21:48
【问题描述】:

我需要创建一个复杂的“for”循环,但是在阅读了一些示例之后,我仍然不知道如何以适当的 R 方式编写它,因此我不确定它是否会起作用。我还是一个 R 初学者 :(

我有一个长格式的数据集,有不同的场合,但是,有些场合并不是真正的新场合,因为开始日期是相同的,但有不同的罪行,我需要在一个名为“的新列中复制” offence2",在此之后我需要删除虚假的新场合,以便只保留代表新场合的行。我的真实数据在一个日期最多有 8 种不同的罪行,但我举了一个更简单的例子。

这是我的数据外观的示例

    id<-c(1,1,1,2,2,3,3,3,4,4,4,4,5,5,5)
    dstart<-c("25/11/2006", "13/12/2006","13/12/2006","07/02/2006","07/02/2006",
     "15/01/2006", "22/03/2006","18/09/2006", "04/03/2006","04/03/2006",
     "22/08/2006","22/08/2006","11/04/2006", "11/04/2006", "19/10/2006") 
    dstart1<-as.Date(dstart, "%d/%m/%Y")

    offence<-c("a","b","c","b","d","a","a","e","b","a","c","a","a","b","a")
    cod_offence<-c(25, 26,27,26,28,25,25,29,26,25,27,25,25,26,25)

    mydata<-data.frame(id, dstart1, offence, cod_offence)

数据

       id    dstart1   offence  cod_offence
   1   1   2006-11-25       a          25
   2   1   2006-12-13       b          26
   3   1   2006-12-13       c          27
   4   2   2006-02-07       b          26
   5   2   2006-02-07       d          28
   6   3   2006-01-15       a          25
   7   3   2006-03-22       a          25
   8   3   2006-09-18       e          29
   9   4   2006-03-04       b          26
   10  4   2006-03-04       a          25
   11  4   2006-08-22       c          27
   12  4   2006-08-22       a          25
   13  5   2006-04-11       a          25
   14  5   2006-04-11       b          26
   15  5   2006-10-19       a          25

我需要这样的东西:

      id    dstart1   offence  cod_offence   offence2
   1   1   2006-11-25       a          25       NA
   2   1   2006-12-13       b          26       c
   3   1   2006-12-13       c          27       NA
   4   2   2006-02-07       b          26       d
   5   2   2006-02-07       d          28       NA
   6   3   2006-01-15       a          25       NA
   7   3   2006-03-22       a          25       NA
   8   3   2006-09-18       e          29       NA
   9   4   2006-03-04       b          26       a
   10  4   2006-03-04       a          25       NA
   11  4   2006-08-22       c          27       a
   12  4   2006-08-22       a          25       NA
   13  5   2006-04-11       a          25       b
   14  5   2006-04-11       b          26       NA
   15  5   2006-10-19       a          25       NA

我认为我需要做这样的事情: 给定 i = 个人 j=个体内部观察

for each individual I need to check whether mydata$dstart1(j) = mydata$dstart1(j+1)
if this is true, then copy mydata$offence2(j)=mydata$offence(j+1), otherwise keep the same value
This has to stop if id(j) != id(j+1) and re-start with the new id.

我的问题是我不知道如何把它放在一个循环中。

谢谢!!

更新

是的,这个例子可以正常工作,但我的真实数据还没有,因为它们有点复杂 如果我有三个或更多的重复日期而不是两个重复的日期,会发生什么?他们每个人都有不同的罪行。遵循@CathG 解决方案,我需要根据违规次数(在我的情况下为 8)创建更多变量,我想我需要一个新向量来识别 id 内观察的位置和一个新的“指令”,告诉 R根据 mydata$dstart1 的位置,需要将值复制到不同的列中。但话说回来,我不知道该怎么做。

     id    dstart1   offence  cod_offence   offence2   offence3  offence4
   1   1   2006-11-25       a          25       NA        NA       NA
   2   1   2006-12-13       b          26       c         NA       NA
   3   1   2006-12-13       c          27       NA        NA       NA
   4   2   2006-02-07       b          26       d         NA       NA
   5   2   2006-02-07       d          28       NA        NA       NA
   6   2   2006-04-12       b          26       d         c        a
   7   2   2006-04-12       d          28       NA        NA       NA
   8   2   2006-04-12       c          27       NA        NA       NA
   9   2   2006-04-12       a          25       NA        NA       NA

再次感谢!!!

【问题讨论】:

  • 嗨,是的,它们确实有效!但我认为我的问题仍然存在,因为我问了一个不完整的问题,请你再看一遍好吗?谢谢!
  • @bmora,我刚刚编辑了我的答案以使其适应您的更新。告诉我你现在还好吗
  • @bmora 我还使用您的新数据集编辑了答案。请让我知道它是否有效。

标签: r for-loop


【解决方案1】:

您可以使用base R

indx <- with(mydata, ave(as.numeric(dstart1), id,
           FUN=function(x) c(x[-1]==x[-length(x)], FALSE)))

 transform(mydata, offence2=ifelse(!!indx, 
            c(as.character(offence[-1]), NA), NA))

或使用dplyr

library(dplyr)
mydata %>%
      group_by(id) %>% 
      mutate(offence2= dstart1==lead(dstart1), 
       offence2= ifelse(!is.na(offence2)&offence2,
         as.character(lead(offence)), NA_character_))
#     id    dstart1 offence cod_offence offence2
#1   1 2006-11-25       a          25       NA
#2   1 2006-12-13       b          26        c
#3   1 2006-12-13       c          27       NA
#4   2 2006-02-07       b          26        d
#5   2 2006-02-07       d          28       NA
#6   3 2006-01-15       a          25       NA
#7   3 2006-03-22       a          25       NA
#8   3 2006-09-18       e          29       NA
#9   4 2006-03-04       b          26        a
#10  4 2006-03-04       a          25       NA
#11  4 2006-08-22       c          27        a
#12  4 2006-08-22       a          25       NA
#13  5 2006-04-11       a          25        b
#14  5 2006-04-11       b          26       NA
#15  5 2006-10-19       a          25       NA

或使用data.table

library(data.table)
setDT(mydata)[, indx:=c(dstart1[-1]==dstart1[-.N], FALSE), by=id][,
      offence2:=ifelse(indx, as.character(offence)[which(indx)+1],
                                 NA_character_), by=id][,indx:=NULL]

mydata
 #    id    dstart1 offence cod_offence offence2
 #1:  1 2006-11-25       a          25       NA
 #2:  1 2006-12-13       b          26        c
 #3:  1 2006-12-13       c          27       NA
 #4:  2 2006-02-07       b          26        d
 #5:  2 2006-02-07       d          28       NA
 #6:  3 2006-01-15       a          25       NA
 #7:  3 2006-03-22       a          25       NA
 #8:  3 2006-09-18       e          29       NA
 #9:  4 2006-03-04       b          26        a
#10:  4 2006-03-04       a          25       NA
#11:  4 2006-08-22       c          27        a
#12:  4 2006-08-22       a          25       NA
#13:  5 2006-04-11       a          25        b
#14:  5 2006-04-11       b          26       NA
#15:  5 2006-10-19       a          25       NA

更新

使用新的数据集mydata2,如果使用第一种方法,我们得到d1

 indx <- with(mydata2, ave(as.numeric(dstart1), id,
       FUN=function(x) c(x[-1]==x[-length(x)], FALSE)))

 d1 <-  transform(mydata2, offence2=ifelse(!!indx, 
                  c(as.character(offence[-1]), NA), NA))

d1,我们可以创建一个indx 列,然后使用dcastlong 形式转换为wide 用于列offence2。如果所有列都包含NAs,我们可以使用colSums(is.na( 将其删除。重命名列,然后使用mutate_each from dplyr 对列进行排序,最后cbind 使用mydata2 对其进行排序

 d1$indx <- with(d1, ave(seq_along(id), id, dstart1, FUN=seq_along))
 library(reshape2)

 d2 <- dcast(d1, id + dstart1+indx~indx, value.var='offence2')
 d2New <- d2[,colSums(is.na(d2))!=nrow(d2)]
 nm1 <-  grep("^\\d",colnames(d2New))
 colnames(d2New)[nm1] <- paste0('offence', 2:(length(nm1)+1)) 
 d3 <- d2New[,-3] %>%
                group_by(id, dstart1) %>%
                mutate_each(funs(.[order(.)])) %>%
                ungroup()

 cbind(mydata,d3[,-c(1:2)])
 #    id    dstart1 offence cod_offence offence2 offence3 offence4
 #1  1 2006-11-25       a          25     <NA>     <NA>     <NA>
 #2  1 2006-12-13       b          26        c     <NA>     <NA>
 #3  1 2006-12-13       c          27     <NA>     <NA>     <NA>
 #4  2 2006-02-07       b          26        d     <NA>     <NA>
 #5  2 2006-02-07       d          28     <NA>     <NA>     <NA>
 #6  2 2006-04-12       b          26        d        c        a
 #7  2 2006-04-12       d          28     <NA>     <NA>     <NA>
 #8  2 2006-04-12       c          27     <NA>     <NA>     <NA>
 #9  2 2006-04-12       a          25     <NA>     <NA>     <NA>

数据

mydata <- structure(list(id = c(1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 
5, 5), dstart1 = structure(c(13477, 13495, 13495, 13186, 13186, 
13163, 13229, 13409, 13211, 13211, 13382, 13382, 13249, 13249, 
13440), class = "Date"), offence = structure(c(1L, 2L, 3L, 2L, 
4L, 1L, 1L, 5L, 2L, 1L, 3L, 1L, 1L, 2L, 1L), .Label = c("a", 
"b", "c", "d", "e"), class = "factor"), cod_offence = c(25, 26, 
27, 26, 28, 25, 25, 29, 26, 25, 27, 25, 25, 26, 25)), .Names = c("id", 
"dstart1", "offence", "cod_offence"), row.names = c(NA, -15L), 
class = "data.frame")

mydata2 <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L),
dstart1 = structure(c(13477, 13495, 13495, 13186, 13186, 13250, 13250,
 13250, 13250), class = "Date"), offence = c("a", "b", "c", "b", "d", "b",
"d", "c", "a"), cod_offence = c(25L, 26L, 27L, 26L, 28L, 26L, 28L, 27L, 25L
)), .Names = c("id", "dstart1", "offence", "cod_offence"), row.names =
 c("1","2", "3", "4", "5", "6", "7", "8", "9"), class = "data.frame")

【讨论】:

  • 感谢您的回答,它有效!很遗憾我不能接受多个答案。非常感谢你给我不同的选择。我刚刚更新了我的问题,因为我没有涵盖第一个问题中的所有选项。再次感谢!!
  • 哇!!非常感谢你。这与我的数据完美搭配!!我仍然不明白你所做的一切,但它有效!我真的很感谢你的帮助。感谢您和本网站上的人们,我学到了很多东西。
  • 起来!我不知道我做到了……我什至不知道这是什么时候发生的;不管怎样,我不是故意的,我会再次更改它。
【解决方案2】:

带有split和一个循环:

# data with repeated dates /offences
id<-c(1,1,1,2,2,3,3,3,4,4,4,4,5,5,5,5,5,5)
dstart<-c("25/11/2006", "13/12/2006","13/12/2006","07/02/2006","07/02/2006",
     "15/01/2006", "22/03/2006","18/09/2006", "04/03/2006","04/03/2006",
     "22/08/2006","22/08/2006","11/04/2006", "11/04/2006", "19/10/2006","19/10/2006","19/10/2006","19/10/2006") 
dstart1<-as.Date(dstart, "%d/%m/%Y")
offence<-c("a","b","c","b","d","a","a","e","b","a","c","a","a","b","a","c","b","a")
cod_offence<-c(25, 26,27,26,28,25,25,29,26,25,27,25,25,26,25,27,25,25)
mydata<-data.frame(id, dstart1, offence, cod_offence)

# see the max offences there are for same id and date
maxoff<-max(table(mydata$id,mydata$dstart1))
mydata[,paste("offence",2:maxoff,sep="")]<-NA

# split your data according to id
splitmydata<-split(mydata,mydata$id) 

# for each "per id dataset", apply a function that looks for repeated offences / dates and fill the "offences" variables in the row with first occurence of specific date.
splitmydata2<-lapply(splitmydata, 
                       function(tab){
                          for(datestart in unique(tab[,"dstart1"])){
                            ind_date<-sort(which(tab[,"dstart1"]==datestart))
                            if(length(ind_date[-1])){
                               tab[ind_date[1],grep("^offence",colnames(tab),value=T)[2:(length(ind_date))]]<-as.character(tab[ind_date[-1],"offence"])
                              }
                           }
                          return(tab)
                       }
                     )

mydata2<-unsplit(splitmydata2,mydata$id) # finally, unsplit your data

> mydata2
   id    dstart1 offence cod_offence offence2 offence3 offence4
1   1 2006-11-25       a          25     <NA>     <NA>     <NA>
2   1 2006-12-13       b          26        c     <NA>     <NA>
3   1 2006-12-13       c          27     <NA>     <NA>     <NA>
4   2 2006-02-07       b          26        d     <NA>     <NA>
5   2 2006-02-07       d          28     <NA>     <NA>     <NA>
6   3 2006-01-15       a          25     <NA>     <NA>     <NA>
7   3 2006-03-22       a          25     <NA>     <NA>     <NA>
8   3 2006-09-18       e          29     <NA>     <NA>     <NA>
9   4 2006-03-04       b          26        a     <NA>     <NA>
10  4 2006-03-04       a          25     <NA>     <NA>     <NA>
11  4 2006-08-22       c          27        a     <NA>     <NA>
12  4 2006-08-22       a          25     <NA>     <NA>     <NA>
13  5 2006-04-11       a          25        b     <NA>     <NA>
14  5 2006-04-11       b          26     <NA>     <NA>     <NA>
15  5 2006-10-19       a          25        c        b        a
16  5 2006-10-19       c          27     <NA>     <NA>     <NA>
17  5 2006-10-19       b          25     <NA>     <NA>     <NA>
18  5 2006-10-19       a          25     <NA>     <NA>     <NA>

【讨论】:

  • @akrun,“更好的解决方案”有点主观......您的解决方案可能在效率方面更好,但我的可能更接近 OP 的想法......
  • @CathG 好的,没问题。我只是好奇,仅此而已。
  • @akrun 我不知道,这是 OP 使用的术语之一。
  • 我接受了@CathG 解决方案,因为我认为它可以使用更复杂的结构,但这不是因为我的错误!所有解决方案都运行良好,但我问了一个不完整的问题,所以我的问题仍然存在。我更新了这个问题,如果你可以看看它。再次感谢
  • @CathG 再次抱歉!!我的意思是在个人内部重复约会,有不同的罪行
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2019-10-31
  • 1970-01-01
  • 2017-11-21
  • 1970-01-01
  • 2022-01-25
  • 1970-01-01
  • 2018-08-14
相关资源
最近更新 更多