【问题标题】:Accelerating one-to-many correlation calculations in Python加速 Python 中的一对多相关计算
【发布时间】:2016-10-01 10:24:51
【问题描述】:

我想在 Python 中计算向量和数组的每一行之间的 Pearson 相关系数(假设为 numpy 和/或 scipy)。由于实际数据数组的大小和内存限制,无法使用标准相关矩阵计算函数。这是我的幼稚实现:

import numpy as np
import scipy.stats as sps

np.random.seed(0)

def correlateOneWithMany(one, many):
    """Return Pearson's correlation coef of 'one' with each row of 'many'."""
    pr_arr = np.zeros((many.shape[0], 2), dtype=np.float64)
    pr_arr[:] = np.nan
    for row_num in np.arange(many.shape[0]):
        pr_arr[row_num, :] = sps.pearsonr(one, many[row_num, :])
    return pr_arr

obs, varz = 10 ** 3, 500
X = np.random.uniform(size=(obs, varz))

pr = correlateOneWithMany(X[0, :], X)

%timeit correlateOneWithMany(X[0, :], X)
# 10 loops, best of 3: 38.9 ms per loop

任何有关加速此过程的想法将不胜感激!

【问题讨论】:

  • "...由于实际数据数组的大小和内存限制。"请给出数组的典型大小和实际内存限制。

标签: python python-2.7 numpy scipy statistics


【解决方案1】:

scipy.spatial.distance 模块实现了“相关距离”,它只是简单地减去相关系数。您可以使用函数cdist计算一对多的距离,并将结果从1中减去得到相关系数。

这是您脚本的修改版本,其中包括使用 cdist 计算相关系数:

import numpy as np
import scipy.stats as sps
from scipy.spatial.distance import cdist

np.random.seed(0)

def correlateOneWithMany(one, many):
    """Return Pearson's correlation coef of 'one' with each row of 'many'."""
    pr_arr = np.zeros((many.shape[0], 2), dtype=np.float64)
    pr_arr[:] = np.nan
    for row_num in np.arange(many.shape[0]):
        pr_arr[row_num, :] = sps.pearsonr(one, many[row_num, :])
    return pr_arr

obs, varz = 10 ** 3, 500
X = np.random.uniform(size=(obs, varz))

pr = correlateOneWithMany(X[0, :], X)

c = 1 - cdist(X[0:1, :], X, metric='correlation')[0]

print(np.allclose(c, pr[:, 0]))

时间:

In [133]: %timeit correlateOneWithMany(X[0, :], X)
10 loops, best of 3: 37.7 ms per loop

In [134]: %timeit 1 - cdist(X[0:1, :], X, metric='correlation')[0]
1000 loops, best of 3: 1.11 ms per loop

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2015-04-09
    • 2016-01-04
    • 1970-01-01
    • 1970-01-01
    • 2017-07-16
    • 1970-01-01
    • 1970-01-01
    • 2021-01-19
    相关资源
    最近更新 更多