【问题标题】:Using for loop i R to create new columns based on columns created within the loop使用 for loop i R 根据循环内创建的列创建新列
【发布时间】:2017-04-28 14:14:06
【问题描述】:

我正在尝试模拟三个接受治疗的患者比例不同的人群的十年死亡风险。我已经每年这样做了十年,结果证明这是一个相当长的代码。我想要的是将此转换为十年的每月一次,并且为了避免数百行代码,我想使用 for 循环。

我的数据看起来像这样

set.seed(1234)
N <- 750000

id <- c(1:N)

###creates a sex variable for men and appends women
treated <- rep.int(0,125000)
treated <- append(treated, rep.int(1,125000))
treated <- append(treated, rep.int(0,100000))
treated <- append(treated, rep.int(1,150000))
treated <- append(treated, rep.int(0,75000))
treated <- append(treated, rep.int(1,175000))

groupname <- rep.int(1,250000)
groupname <- c(groupname, rep.int(2,250000))
groupname <- c(groupname, rep.int(3,250000))  

根据性别和身份向量创建数据框

data = data.frame(treated, id, groupname)
class(data$treated)
data$treated <- factor(data$treated, levels = c(0,1), labels = c("untreated","treated"))
data$groupname <- factor(data$groupname, levels = c(1,2,3), labels = c("group 1", "group 2", "group 3"))

然后我生成每一个“波”,就像这样的十年(基本上相同的代码,只是为每个波分配了一个新的列名):

data$year_0 <- 1
data$year_1 <-  ifelse(data$treated=="treated",rbinom(N, 1, 1-0.035/4), rbinom(N, 1, 1-0.05/4))

data$year_2 <- ifelse(data$treated=="treated", 
                      ifelse(data$year_1 =="0",  0, rbinom(N, 1, 1-0.035/4)), 
                      ifelse(data$year_1 =="0",  0, rbinom(N, 1, 1-0.05/4))
)
data$year_3 <- ifelse(data$treated=="treated", 
                      ifelse(data$year_2 =="0",  0, rbinom(N, 1, 1-0.035/4)), 
                      ifelse(data$year_2 =="0",  0, rbinom(N, 1, 1-0.05/4))
)
data$year_4 <- ifelse(data$treated=="treated", 
                      ifelse(data$year_3 =="0",  0, rbinom(N, 1, 1-0.035/4)), 
                      ifelse(data$year_3 =="0",  0, rbinom(N, 1, 1-0.05/4))
)
data$year_5 <- ifelse(data$treated=="treated", 
                      ifelse(data$year_4 =="0",  0, rbinom(N, 1, 1-0.035/4)), 
                      ifelse(data$year_4 =="0",  0, rbinom(N, 1, 1-0.05/4))
)
data$year_6 <- ifelse(data$treated=="treated", 
                      ifelse(data$year_5 =="0",  0, rbinom(N, 1, 1-0.035/4)), 
                      ifelse(data$year_5 =="0",  0, rbinom(N, 1, 1-0.05/4))
)
data$year_7 <- ifelse(data$treated=="treated", 
                      ifelse(data$year_6 =="0",  0, rbinom(N, 1, 1-0.035/4)), 
                      ifelse(data$year_6 =="0",  0, rbinom(N, 1, 1-0.05/4))
)
data$year_8 <- ifelse(data$treated=="treated", 
                      ifelse(data$year_7 =="0",  0, rbinom(N, 1, 1-0.035/4)), 
                      ifelse(data$year_7 =="0",  0, rbinom(N, 1, 1-0.05/4))
)
data$year_9 <- ifelse(data$treated=="treated", 
                      ifelse(data$year_8 =="0",  0, rbinom(N, 1, 1-0.035/4)), 
                      ifelse(data$year_8 =="0",  0, rbinom(N, 1, 1-0.05/4))
)
data$year_10 <- ifelse(data$treated=="treated", 
                       ifelse(data$year_9 =="0",  0, rbinom(N, 1, 1-0.035/4)), 
                       ifelse(data$year_9 =="0",  0, rbinom(N, 1, 1-0.05/4))
)
###converts to long format
data_long <- reshape(data, direction="long", varying= c(list(4:14)), sep = "_", 
                     idvar="id", timevar=c("year"))
class(data_long$year)  
data_long$year <- as.numeric(data_long$year)
data_long$year <- data_long$year -1

我想用 for 循环来做这个,所以我可以模拟 120 个月 我写了这段代码

for (i in 1:10){ n <- ifelse(data$treated=="treated", 
                                      ifelse(data$year_[(i-1)] =="0",  0, rbinom(N, 1, 1-0.035/4)), 
                                      ifelse(data$year_[(i-1)] =="0",  

0, rbinom(N, 1, 1-0.05/4))

                                 )
              data$year_[i] <- n 
    }

##1: I data$year_[i] <- n :

##error number of items to replace is not a multiple of replacement length

据我了解,此错误表明 for 循环的编码方式返回的数据长度不兼容。通常我可以通过谷歌进行故障排除,但是当我不在 for 循环中时代码运行 我不明白问题出在哪里。 我认为错误可能在于将 [i] 解释为不是可用于命名列的字符串,而是使用 paste 除了已经提到的警告之外,还会导致此警告。

##Fejl i `$<-.data.frame`(`*tmp*`, "year_", value = c(NA, NA, NA, NA,  : 
  ##replacement has 750001 rows, data has 750000 

关于这个问题的谷歌搜索结果似乎并没有真正说明这是一个问题。 所以现在的问题是,我知道的不够多,无法弄清楚问题是什么。

【问题讨论】:

  • 为什么不将列year_i 放在一个额外的矩阵中?然后可以使用cbind()逐列扩展矩阵。

标签: r for-loop


【解决方案1】:

考虑使用方括号引用 [[...]] 到列名来传递带有 paste0() 的字符串和第一年的条件,然后是所有其他年份:

data$year_0 <- 1

for (i in 1:10){ 
  if (i == 1){
     n <- ifelse(data$treated=="treated", rbinom(N, 1, 1-0.035/4), rbinom(N, 1, 1-0.05/4))
  } 
  else {
     n <- ifelse(data$treated=="treated", 
                 ifelse(data[[paste0("year_", i-1)]] == 0,  0, rbinom(N, 1, 1-0.035/4)), 
                 ifelse(data[[paste0("year_", i-1)]] == 0,  0, rbinom(N, 1, 1-0.05/4))
          )
  }
  data[[paste0("year_", i)]] <- n 
}

【讨论】:

  • 非常感谢您的回复,它就像一个魅力。可惜我的电脑由于尺寸的原因无法将数据重新整形为长格式。不过非常感谢!
  • 太棒了!乐意效劳。查看 reshape2 包的 melt 以获得长格式。
【解决方案2】:

您可以将列year_i 放在一个额外的矩阵中。然后您可以使用cbind() 逐列扩展矩阵:

set.seed(1234)
N <- 750000

data = data.frame(treated=rep(c(0,1,0,1,0,1), c(125000, 125000, 100000, 150000, 75000, 175000)), id=1:N, 
groupname=rep(1:3, each=250000))
data$treated <- factor(data$treated, levels = c(0,1), labels = c("untreated","treated"))
data$groupname <- factor(data$groupname, levels = c(1,2,3), labels = c("group 1", "group 2", "group 3"))

Year <- matrix(1, N, 1) # data$year_0 <- 1
Year <- cbind(Year, ifelse(data$treated=="treated",rbinom(N, 1, 1-0.035/4), rbinom(N, 1, 1-0.05/4))) # data$year_1
for (i in 2:10) {
  lastcol <- Year[,ncol(Year)]
  Year <- cbind(Year,
                ifelse(data$treated=="treated", 
                       ifelse(lastcol==0,  0, rbinom(N, 1, 1-0.035/4)), 
                       ifelse(lastcol==0,  0, rbinom(N, 1, 1-0.05/4)))
                )
}

您可以通过预分配加快一点速度(但主要是采样):

set.seed(1234)
K <- 10 # year_0 ... year_K
Year <- matrix(NA, N, K+1)
Year[,1] <- 1  # year_0
Year[,2] <- ifelse(data$treated=="treated", rbinom(N, 1, 1-0.035/4), rbinom(N, 1, 1-0.05/4)) # data$year_1
for (i in 3:(K+1)) Year[,i] <- ifelse(data$treated=="treated", 
                                      ifelse(Year[,i-1]==0,  0, rbinom(N, 1, 1-0.035/4)), 
                                      ifelse(Year[,i-1]==0,  0, rbinom(N, 1, 1-0.05/4)))

如果您愿意,可以将数据框和矩阵 Year 放在一起。如果是这样,最好将列名分配给矩阵:

colnames(Year) <- paste0("year_", 0:K)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2017-03-24
    • 1970-01-01
    • 2021-11-24
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-12-22
    相关资源
    最近更新 更多