【发布时间】:2014-04-30 04:16:39
【问题描述】:
我有这个函数来计算向量 x 的平方马氏距离的平均值:
def mahalanobis_sqdist(x, mean, Sigma):
'''
Calculates squared Mahalanobis Distance of vector x
to distibutions' mean
'''
Sigma_inv = np.linalg.inv(Sigma)
xdiff = x - mean
sqmdist = np.dot(np.dot(xdiff, Sigma_inv), xdiff)
return sqmdist
我有一个形状为(25, 4) 的numpy 数组。所以,我想将该函数应用于我的数组的所有 25 行,而不使用 for 循环。所以,基本上,我怎样才能写出这个循环的向量化形式:
for r in d1:
mahalanobis_sqdist(r[0:4], mean1, Sig1)
mean1 和 Sig1 是:
>>> mean1
array([ 5.028, 3.48 , 1.46 , 0.248])
>>> Sig1 = np.cov(d1[0:25, 0:4].T)
>>> Sig1
array([[ 0.16043333, 0.11808333, 0.02408333, 0.01943333],
[ 0.11808333, 0.13583333, 0.00625 , 0.02225 ],
[ 0.02408333, 0.00625 , 0.03916667, 0.00658333],
[ 0.01943333, 0.02225 , 0.00658333, 0.01093333]])
我尝试了以下方法,但没有成功:
>>> vecdist = np.vectorize(mahalanobis_sqdist)
>>> vecdist(d1, mean1, Sig1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line 1862, in __call__
theout = self.thefunc(*newargs)
File "<stdin>", line 6, in mahalanobis_sqdist
File "/usr/lib/python2.7/dist-packages/numpy/linalg/linalg.py", line 445, in inv
return wrap(solve(a, identity(a.shape[0], dtype=a.dtype)))
IndexError: tuple index out of range
【问题讨论】:
-
scipy.spatial.distance模块也可以为您完成所有这些工作。那么代码将是例如cdist(d1, mean1[None], 'mahalanobis')**2如果mean1不是点的实际平均值,则应分别计算协方差和逆并执行cdist(d1, mean1[None], 'mahalanobis', VI=Sigma_inv)**2
标签: python arrays numpy vectorization