R中的Collatz猜想答案

【问题标题】：Collatz conjecture in RR中的Collatz猜想
【发布时间】：2019-03-09 04:25:50
【问题描述】：

我仍然主要为我自己（和我的学生）教授一些 R。

这是 R 中 Collatz 序列的实现：

f <- function(n)
{
    # construct the entire Collatz path starting from n
    if (n==1) return(1)
    if (n %% 2 == 0) return(c(n, f(n/2)))
    return(c(n, f(3*n + 1)))
}

调用 f(13) 我得到 13、40、20、10、5、16、8、4、2、1

但是请注意，这里的向量大小是动态增长的。这样的举动往往是低效代码的秘诀。有没有更高效的版本？

在 Python 中我会使用

def collatz(n):
    assert isinstance(n, int)
    assert n >= 1

    def __colla(n):

        while n > 1:
            yield n

            if n % 2 == 0:
                n = int(n / 2)
            else:
                n = int(3 * n + 1)

        yield 1

    return list([x for x in __colla(n)])

我找到了一种无需先验指定维度即可写入向量的方法。因此，一个解决方案可能是

collatz <-function(n)
{
  stopifnot(n >= 1)  
  # define a vector without specifying the length
  x = c()

  i = 1
  while (n > 1)
  {
    x[i] = n
    i = i + 1
    n = ifelse(n %% 2, 3*n + 1, n/2)
  }
  x[i] = 1
  # now "cut" the vector
  dim(x) = c(i)
  return(x)
}

【问题讨论】：

a-look-a-r-vectorization-through-the-collatz-conjecture
r-and-the-collatz-conjecture-part-2.html
google R collatz efficient - 前 2 次点击 - 应该有帮助 :) 不知道 R
不幸的是，并非如此。我以前看过这些链接。他们甚至没有尝试明确地构造猜想中的数字序列。他们只是更新一个计数器来获取序列的长度。他们运行参数向量。尽管如此，非常有趣。谢谢...

标签： python r collatz

【解决方案1】：

我很想知道通过 Rcpp 实现的 C++ 与您的两种基本 R 方法相比如何。这是我的结果。

首先让我们定义一个函数collatz_Rcpp，它返回给定整数n 的冰雹序列。（非递归）实现改编自Rosetta Code。

library(Rcpp)
cppFunction("
    std::vector<int> collatz_Rcpp(int i) {
        std::vector<int> v;
        while(true) {
            v.push_back(i);
            if (i == 1) break;
            i = (i % 2) ? (3 * i + 1) : (i / 2);
        }
        return v;
    }
")

我们现在使用您的基础 R 和 Rcpp 实现运行 microbenchmark 分析。我们计算前 10000 个整数的冰雹序列

# base R implementation
collatz_R <- function(n) {
    # construct the entire Collatz path starting from n
    if (n==1) return(1)
    if (n %% 2 == 0) return(c(n, collatz(n/2)))
    return(c(n, collatz(3*n + 1)))
}

# "updated" base R implementation
collatz_R_updated <-function(n) {
  stopifnot(n >= 1)
  # define a vector without specifying the length
  x = c()
  i = 1
  while (n > 1) {
    x[i] = n
    i = i + 1
    n = ifelse(n %% 2, 3*n + 1, n/2)
  }
  x[i] = 1
  # now "cut" the vector
  dim(x) = c(i)
  return(x)
}

library(microbenchmark)
n <- 10000
res <- microbenchmark(
    baseR = sapply(1:n, collatz_R),
    baseR_updated = sapply(1:n, collatz_R_updated),
    Rcpp = sapply(1:n, collatz_Rcpp))

res
#         expr        min         lq       mean     median         uq       max
#        baseR   65.68623   73.56471   81.42989   77.46592   83.87024  193.2609
#baseR_updated 3861.99336 3997.45091 4240.30315 4122.88577 4348.97153 5463.7787
#         Rcpp   36.52132   46.06178   51.61129   49.27667   53.10080  168.9824

library(ggplot2)
autoplot(res)

（非递归）Rcpp 实现似乎比原始（递归）基本 R 实现快 30% 左右。 “更新”（非递归）基本 R 实现比原始（递归）基本 R 方法慢得多（microbenchmark 在我的 MacBook Air 上需要大约 10 分钟才能完成，因为baseR_updated）。

【讨论】：

哇，令人着迷的结果。非常感谢。现在，我尝试远离 C++，因为它又增加了一层复杂性。我对 baseR 和 baseR_updated 之间的差异感到非常惊讶。我假设 baseR_updated 比 baseR 快。所以 baseR 在效率和优雅上都胜出（超过 baseR_updated...）
感谢@tschm 提出一个有趣的问题。我猜结论是递归算法 (baseR) 非常快，尽管在 R 中动态增长了一个向量。Rcpp 方法不是递归的，并且比等效的基本 R 方法 baseR_updated 快得多。递归的Rcpp 版本可能是赢家；-)
我猜下一站将是缓存，特别是如果你像你一样计算 10000 个序列......
我想在 R 中缓存一个函数并不是一件容易的事。我可以做 m_f