从循环生成的序列组成一个 data.frame答案

【问题标题】：Composing a data.frame from loop-generated sequences从循环生成的序列组成一个 data.frame
【发布时间】：2019-12-10 11:08:30
【问题描述】：

我有一个数据集，它由对鱼的重量、捕获它们的朱利安日期以及它们的名字的观察组成。我正在寻求根据一年中的某一天（朱利安日期）评估这些鱼的平均增长率。我相信最好的方法是用两个字段组成一个 data.frame：“Julian Date”和“Growth”。这个想法是这样的：对于在 1 月 1 日 (1) 观察到体重为 100 的鱼和在 4 月 10 日 (101) 再次观察到体重为 200 的鱼，生长速率将为 100 克/100 天，或 1 克/天。我将在 data.frame 中将其表示为 100 行，其中“Julian 日期”列由 Julian 日期序列（1:100）组成，“Growth”列由平均增长率（1g/天）组成一整天。

我试图编写一个遍历每条鱼的 for 循环，计算平均增长率，然后创建一个列表，其中每个索引包含儒略日期的序列和增长率（重复次数等于儒略日期序列的长度）。然后我会利用这个函数来组成我的 data.frame。

growth_list <- list() # initialize empty list
p <- 1 # initialize increment count

#  Looks at every other fish ID beginning at 1 (all even-number observations are the same fish at a later observation)

for (i in seq(1, length(df$FISH_ID), by = 2)){
  rate <- (df$growth[i+1]-df$growth[i])/(as.double(df$date[i+1])-as.double(df$date[i]))
  growth_list[[p]] <- list(c(seq(as.numeric(df$date[i]),as.numeric(df$date[i+1]))), rep(rate, length(seq(from = as.numeric(df$date[i]), to = as.numeric(df$date[i+1])))))
  p <- p+1 # increase to change index of list item in next iteration
}

# Converts list of vectors (the rows which fulfill above criteria) into a data.frame

growth_df <- do.call(rbind, growth_list)

我的预期结果可以在这里说明：https://imgur.com/YXKLkpK

我的实际结果在这里说明：https://imgur.com/Zg4vuVd

如您所见，实际结果似乎是一个 data.frame，其中有两列指定对象的类型以及原始列表项的长度。也就是说，该数据集的第 1 行包含 169 天的观测间隔，因此包含 169 个儒略日期和 169 次重复增长率。p>

【问题讨论】：

标签： r

【解决方案1】：

代替list()，使用data.frame() 和命名列来构建要在末尾绑定行的数据框列表：

growth_list <- vector(mode="list", length=length(df$FISH_ID)/2)

for (i in seq(1, length(df$FISH_ID), by=2)){
  rate <- with(df, (growth[i+1]-growth[i])/(as.double(date[i+1])-as.double(date[i])))
  date_seq <- seq(as.numeric(df$date[i]), as.numeric(df$date[i+1]))

  growth_list[[p]] <- data.frame(Julian_Date = date_seq, 
                                 Growth_Rate = rep(rate, length(date_seq))    
  p <- p + 1 
}

growth_df <- do.call(rbind, growth_list)

【讨论】：

是的，这是最简单的答案。

【解决方案2】：

这是一个使用 dplyr 和 plyr 和一些玩具数据的解决方案。有 20 条鱼，开始和结束时间是随机的，每次加上随机权重。求出一段时间内的增长率，然后为每条鱼创建一个新的df，每天1行和日平均增长率，并输出一个包含所有鱼的新df。

df <- data.frame(fish=rep(seq(1:20),2),weight=sample(c(50:100),40,T),
                 time=sample(c(1:100),40,T))

df1 <- df %>% group_by(fish) %>% arrange(time) %>% 
  mutate(diff.weight=weight-lag(weight),
         diff.time=time-lag(time)) %>% 
  mutate(rate=diff.weight/diff.time) %>% 
  filter(!is.na(rate)) %>% 
  ddply(.,.(fish),function(x){
  data.frame(time=seq(1:x$diff.time),rate=x$rate)
})

head(df1)
  fish time       rate
1    1    1 -0.7105263
2    1    2 -0.7105263
3    1    3 -0.7105263
4    1    4 -0.7105263
5    1    5 -0.7105263
6    1    6 -0.7105263

tail(df1)
    fish time       rate
696   20   47 -0.2307692
697   20   48 -0.2307692
698   20   49 -0.2307692
699   20   50 -0.2307692
700   20   51 -0.2307692
701   20   52 -0.2307692

【讨论】：

【解决方案3】：

欢迎来到stackoverflow

关于你的代码的一些事情：

我建议使用 apply 函数而不是 for 循环。您可以在 apply 中设置参数以执行逐行功能。它使您的代码运行得更快。 apply 系列函数还会为您创建一个列表，从而减少您为创建列表和填充列表而编写的代码。
通常会向用户提供要使用的初始数据的 sn-p 示例。有时我们描述数据的方式并不代表我们的实际数据。这一传统对于减轻任何通信错误是必要的。如果可以，请制作一个虚拟数据集供我们使用。
您是否尝试过使用 as.data.frame(growth_list) 或 data.frame(growth_list)？

另一种选择是在执行 rbind 函数的 for 循环中使用 if else 语句。这看起来像这样：

#make a row-wise for loop
for(x in 1:nrow(i)){

#insert your desired calculations here. You can turn the rows into their own dataframe by using this, which may make it easier to perform your calculations:

dataCurrent <- data.frame(i[x,])

#finish with something like this to turn your calculations for each row into an output dataframe of your choice.

outFish <- cbind(date, length, rate)

#make your final dataframe as follows 

   if(exists("finalFishOut") == FALSE){
      finalFishOut <- outFish
    }else{
      finalFishOut <- rbind(finalFishOut, outFish)
    }

}

请使用 sn-p 数据进行更新，我将使用您的确切解决方案更新此答案。

【讨论】：