【问题标题】:R Replace Dataframe Records using VectorizationR使用矢量化替换数据框记录
【发布时间】:2012-01-03 10:04:32
【问题描述】:

我想知道是否有任何方法可以有效地解决以下问题。我有一个 X-Y 点的集合。对于每个点,我需要生成一定数量的记录,最后,我需要将所有正在生成的记录堆叠在一起。最初,我使用 FOR 循环并在每个循环中使用 cbind 堆栈 data.frame 来执行此操作。现在通过定义最终记录堆栈的尺寸对其进行了一些更改,并尝试用生成的值替换那些 0。我的代码发布在下面(带有**,我指出了我被卡住的地方)..如果你能给我一个提示,或者甚至有更好的解决方案,那将是完美的!

colonies <- read.table(text =             
'  X        Y      Timecount ID_col Age
582906.4 2883317      2004      1  15
583345.9 2883102      2004      2   4
583119.5 2883621      2004      3  13
583385.0 2882933      2004      4   5
583374.0 2882936      2004      5   2
583271.0 2883076      2004      7   5
582898.9 2883229      2004      8   1
582927.9 2883234      2004      9  20
582956.7 2883272      2004     10  13
582958.8 2883249      2004     11   3', header = TRUE)

year = 2004
survival_prob = 0.01
male_prob = 0.5

Present <- colonies$Timecount == year

app <- sum(colonies$Age[Present] >= 4 & colonies$Age[Present] < 10) * 1000 * survival_prob
app2 <- sum(colonies$Age[Present] >= 10 & colonies$Age[Present] < 15) * 10000 * survival_prob
app3 <- sum(colonies$Age[Present] >= 15 & colonies$Age[Present] <= 20) * 100000 * survival_prob

size <- app + app2 + app3

pop <- data.frame(matrix(0,nrow=size,ncol=2))
colnames(pop) <- c("X","Y")

if (dim(pop)[1] > 0){

 #FOR cycle going through each existing point
 for (i in 1:sum(Present)){     

   if (colonies[Present,]$Age[i] < 4) { next
   } else if (colonies[Present,]$Age[i] >= 4 & colonies[Present,]$Age[i] < 10) { alates <- 1000 
   } else if (colonies[Present,]$Age[i] >= 10 & colonies[Present,]$Age[i] < 15) { alates <- 10000 
    } else if (colonies[Present,]$Age[i] >= 15 & colonies[Present,]$Age[i] <= 20) { alates <- 100000 
    }

    indiv <- alates * survival_prob
    #Initialize two coordinate variables based on the established (or existing) colonies
    X_temp <- round(colonies[Present,]$X[i],2)
    Y_temp <- round(colonies[Present,]$Y[i],2)
    distance <- rexp(indiv,rate=1/200)
    theta <- runif(indiv, 0, 2*pi)
    C <- cos(theta)
    S <- sin(theta)
    #XY coords (meters) using polar coordinate transformations
    X <- X_temp + round(S * distance,2)
    Y <- Y_temp + round(C * distance,2)
    pop[,] <- c(X,Y) #******HERE I GOT STUCK...it should be pop[1:indiv,] 
                     #but then it does not work for the next i since it would over write...

    }
    pop$Sex <- rbinom(size,1,male_prob)
    pop$ID <- 1:dim(pop)[1]
}

【问题讨论】:

  • 代码似乎有问题......你真的不想对 4 岁以下的孩子做任何事吗?如果是这样,请立即扔掉它。在我看来,这一切都可以被矢量化。请更好地评论它,也许更好地描述你想要完成的事情。

标签: r for-loop replace dataframe vectorization


【解决方案1】:

我相信这就是您所寻找的,具有表现力的矢量化 R 代码。没有循环,甚至没有 *apply family 或 plyr 命令。你可以做很多事情来使它更灵活,但是使用rep 的核心矢量化和对随机距离的单次调用非常关键。我不知道为什么有一个if 子句用于pop 的尺寸。你需要以不同的方式处理它,因为它没有完成。

year = 2004
survival_prob = 0.01
male_prob = 0.5

# you don't do anything in your for loop or save any of the results if the age is 
# less than 4. I'm going to just remove that from colonies on the assumption that it's 
# larger than posted and comes from a file that you won't change.  Where I edit 
# colonies you might want to work with a copy.
colonies <- colonies[colonies$Age >= 4,]

# only Present selection of colonies is ever used in this code so you could also stop 
# repeatedly selecting... this one I'm imagining you might make a copy of, something 
# like coloniesP in your real code.  In general, you want as little going on in a 
# loop and as little repeating yourself as possible.  Note, this might be memory 
# intensive if colonies is actually very large.  Feel free to going back to selecting 
# since it would happen much less frequently in the new code anyway.
Present <- colonies$Timecount == year
colonies <- colonies[Present,]

# no difference up to size, then it all is
app <- sum(colonies$Age >= 4 & colonies$Age < 10) * 1000 * survival_prob
app2 <- sum(colonies$Age >= 10 & colonies$Age < 15) * 10000 * survival_prob
app3 <- sum(colonies$Age >= 15 & colonies$Age <= 20) * 100000 * survival_prob

size <- app + app2 + app3

#note that ifelse can be used to declare alates as vectors
alates <- ifelse(colonies$Age >= 4 & colonies$Age < 10, 1000, 100000)
alates <- ifelse(colonies$Age >= 10 & colonies$Age < 15, 10000, alates)

# as a consequence, more stuff can be vectorized
indiv <- alates * survival_prob

# we can do some cool stuff with rep to continue vectorizing
# (round when done if you must)
X_temp <- rep(colonies$X, indiv)
Y_temp <- rep(coloines$Y, indiv)

#Initialize two coordinate variables based on the established (or existing) colonies... now as vectors of the entire data frame size
distance <- rexp(size,rate=1/200)
theta <- runif(size, 0, 2*pi)
C <- cos(theta)
S <- sin(theta)
#XY coords (meters) using polar coordinate transformations
X <- X_temp + S * distance
Y <- Y_temp + C * distance
pop <- data.frame(X,Y)  
pop$Sex <- rbinom(size,1,male_prob)
pop$ID <- 1:dim(pop)[1]
# now round... once
pop$X <- round(pop$X,2)
pop$Y <- round(pop$Y,2)

此外,您可能需要注意,即使无法对其进行矢量化,也有一个解决方案可以解决您的问题,即将值分配到 pop 中,这非常简单......不要。只需在返回 data.frame 的函数上使用 lapply 并在之后绑定 data.frame 对象列表。

【讨论】:

  • 谢谢约翰!! lapply 版本是我已经测试过的东西,但是当我将所有列表元素堆叠在一起时,我所拥有的当前循环需要更长的时间......你在这里写的解决方案虽然运行良好......我真的很感激......在那里我有机会通过电子邮件与您联系吗?弗朗切斯科
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2020-03-28
  • 2016-08-19
  • 2014-07-05
  • 2020-08-02
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多