您还可以使用 parallelDist 包的parDist 函数,该函数专为并行距离矩阵计算而构建。优点是该软件包可在 Mac OS、Windows 和 Linux 上使用,并且已经支持 39 种不同的距离测量(参见parDist)。
manhattan 距离的性能比较(系统规格:Mac OS;Intel Core i7,4 核 @ 2.5 GHz 并启用超线程):
library(parallelDist)
library(amap)
library(wordspace)
library(microbenchmark)
set.seed(123)
x <- matrix(rnorm(2000 * 100), nrow = 2000, ncol = 100)
microbenchmark(parDist(x, method = "manhattan"),
Dist(x, method = "manhattan", nbproc = 8),
dist.matrix(x, method = "manhattan"),
times = 10)
Unit: milliseconds
expr min lq mean median uq max neval
parDist(x, method = "manhattan") 210.9478 214.3557 225.5894 221.3705 237.9829 247.0844 10
Dist(x, method = "manhattan", nbproc = 8) 749.9397 755.7351 797.6349 812.6109 824.4075 844.1090 10
dist.matrix(x, method = "manhattan") 256.0831 263.3273 279.0864 275.1882 296.3256 311.3821 10
使用更大的矩阵:
x <- matrix(rnorm(10000 * 100), nrow = 10000, ncol = 100)
microbenchmark(parDist(x, method = "manhattan"),
+ Dist(x, method = "manhattan", nbproc = 8),
+ dist.matrix(x, method = "manhattan"),
+ times = 10)
Unit: seconds
expr min lq mean median uq max neval
parDist(x, method = "manhattan") 6.298234 6.388501 6.737168 6.894203 6.947981 7.221661 10
Dist(x, method = "manhattan", nbproc = 8) 22.722947 24.113681 24.326157 24.477034 24.658145 25.301353 10
dist.matrix(x, method = "manhattan") 7.156861 7.505229 7.544352 7.567980 7.655624 7.800530 10
进一步的性能比较可以在parallelDist的vignette中找到。