【问题标题】:Simulating the Chinese Restaurant Process in R在 R 中模拟中餐厅流程
【发布时间】:2013-11-21 12:25:56
【问题描述】:

我正在尝试在 R 中模拟 Chinese Restaurant process,并想知道是否可以对这种粗略的实现进行任何效率改进。

iTables = 200  # number of tables
iSampleSize = 1000  # number of diners

# initialize the list of tables
listTableOccupants = vector('list', iTables)

for(currentDiner in seq.int(iSampleSize)) {
  # occupation probabilities for the next diner
  vProbabilities = sapply(listTableOccupants, 
                          function(x) ifelse(!is.null(x),
                                             length(x)/currentDiner,
                                             1/currentDiner))
  # pick the index of the lucky table
  iTable = sample.int(iTables, size = 1, prob = vProbabilities)

  # add to the list element corresponding to the table
  listTableOccupants[[iTable]] = 
    c(listTableOccupants[[iTable]], currentDiner) 
}

我特别关心这一行:

  # add to the list element corresponding to the table
  listTableOccupants[[iTable]] = 
    c(listTableOccupants[[iTable]], currentDiner) 

这样有效率吗?

【问题讨论】:

  • 问题是什么?为什么你认为你有效率问题?对于大型数据集,我建议 listTableOccupants <- matrix(nr=iSampleSize, nc=iTables) 并填充指定的插槽 listTableOccupants[currentDiner,iTable]<-currentDiner ,从而避免重新分配空间的需要。
  • @CarlWitthoft 我不知道如何有效地编写随机过程模拟,这更像是一个代码审查问题。另外,我认为你的方式需要我分配一个iTables*iSampleSize 矩阵,这取决于iSampleSize 可能非常大。另外,我使用的数据结构与分区的数学概念完全对应。

标签: r stochastic-process


【解决方案1】:

为避免空间重新分配和稀疏数据结构,您可以改为为每个用餐者应用一个表标签。例如,

nDnr <- 100  # number of diners; must be at least 2
vDnrTbl <- rep(0, nDnr)  # table label for each diner
alpha <- 2  # CRP parameter

vDnrTbl[1] <- 1
for (dnr in 2:length(vDnrTbl)) {

    # compute occupation probabilities for current diner
    vOcc <- table(vDnrTbl[1:(dnr-1)])
    vProb <- c(vOcc, alpha) / (dnr - 1 + alpha)

    # add table label to diner
    nTbl <- as.numeric(names(vOcc)[length(vOcc)])  # avoid overhead of finding max of possibly large vector
    vDnrTbl[dnr] <- sample.int(nTbl+1, size=1, prob=vProb)
}

vDnrTbl获取listTableOccupants

nTbl <- max(c(nTbl, vDnrTbl[dnr]))
listTableOccupants <- lapply(1:nTbl, function(t) which(vDnrTbl == t))

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-11-24
    • 2022-01-15
    • 1970-01-01
    • 2019-10-19
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多