我对仅添加数字与内存分配之间的权衡感到好奇,因此我编写了一些使用 Clojure 向量和原始 (java) 数组的测试代码。结果:
; verify we added numbers in (range 1e7) once or twice
(sum-vec) => 49999995000000
(into-sum-vec) => 99999990000000
ARRAY power = 7
"Elapsed time: 21.840198 msecs" ; sum once
"Elapsed time: 45.036781 msecs" ; 2 sub-sums, then add sub-totals
(timing (sum-sum-arr)) => 99999990000000
"Elapsed time: 397.254961 msecs" ; copy into 2x array, then sum
(timing (sum-arr2)) => 99999990000000
VECTOR power = 7
"Elapsed time: 112.522111 msecs" ; sum once from vector
"Elapsed time: 387.757729 msecs" ; make 2x vector, then sum
所以我们看到,使用原始 long 数组(在我的机器上),我们需要 21 毫秒来求和 1e7 个整数。如果我们把这个和两次并加上小计,我们得到 45 毫秒的经过时间。
如果我们分配一个长度为 2e7 的新数组,在第一个数组中复制两次,然后将这些值相加,我们得到大约 400 毫秒,这比单独添加要慢 8 倍。所以我们看到内存分配和复制是迄今为止最大的成本。
对于原生 Clojure 向量案例,我们看到 112 毫秒的时间来总结一个预先分配的 1e7 个整数的向量。将 orig 向量与自身组合成 2e7 向量,然后求和大约需要 400ms,类似于低级数组的情况。所以我们看到,对于大型数据列表,内存 IO 成本压倒了原生 Java 数组与 Clojure 向量的细节。
上述代码(需要[tupelo "0.9.69"]):
(ns tst.demo.core
(:use tupelo.core tupelo.test)
(:require [criterium.core :as crit]))
(defmacro timing [& forms]
; `(crit/quick-bench ~@forms)
`(time ~@forms)
)
(def power 7)
(def reps (Math/pow 10 power))
(def data-vals (range reps))
(def data-vec (vec data-vals))
(def data-arr (long-array data-vals))
; *** BEWARE of small errors causing reflection => 1000x slowdown ***
(defn sum-arr-1 []
(areduce data-arr i accum 0
(+ accum (aget data-arr i)))) ; => 6300 ms (power 6)
(defn sum-arr []
(let [data ^longs data-arr]
(areduce data i accum 0
(+ accum (aget data i))))) ; => 8 ms (power 6)
(defn sum-sum-arr []
(let [data ^longs data-arr
sum1 (areduce data i accum 0
(+ accum (aget data i)))
sum2 (areduce data i accum 0
(+ accum (aget data i)))
result (+ sum1 sum2)]
result))
(defn sum-arr2 []
(let [data ^longs data-arr
data2 (long-array (* 2 reps))
>> (dotimes [i reps] (aset data2 i (aget data i)))
>> (dotimes [i reps] (aset data2 (+ reps i) (aget data i)))
result (areduce data2 i accum 0
(+ accum (aget data2 i)))]
result))
(defn sum-vec [] (reduce + data-vec))
(defn into-sum-vec [] (reduce + (into data-vec data-vec)))
(dotest
(is= (spyx (sum-vec))
(sum-arr))
(is= (spyx (into-sum-vec))
(sum-arr2)
(sum-sum-arr))
(newline) (println "-----------------------------------------------------------------------------")
(println "ARRAY power = " power)
(timing (sum-arr))
(spyx (timing (sum-sum-arr)))
(spyx (timing (sum-arr2)))
(newline) (println "-----------------------------------------------------------------------------")
(println "VECTOR power = " power)
(timing (sum-vec))
(timing (into-sum-vec))
)
您可以通过更改timing 宏中的注释行从time 切换到使用Criterium。但是,Criterium 是为短任务设计的,您可能应该将 power 保留为 5 或 6。