【发布时间】:2015-03-24 11:27:01
【问题描述】:
我正在尝试计算数据集中一元组的互信息。尝试执行此操作时,我试图提高循环遍历 numpy ndarray 的速度。我有以下代码,其中我使用已创建的具有 6018 行和 27721 列的矩阵“C”来计算 PMI 矩阵。任何想法如何提高 for 循环速度(目前需要将近 4 个小时才能运行)?我在其他一些关于使用 Cython 的文章中读到了,但是还有其他选择吗?在此先感谢您的帮助。
# MAKE MUTUAL INFO MATRIX, PMI
print "Creating mutual information matrix"
N = C.sum()
invN = 1/N # replaced divide by N with multiply by invN in formula below
PMI = np.zeros((C.shape))
row, col = C.shape
for r in xrange(row): # u
for c in xrange(r): # w
if C[r,c]!=0: # if they co-occur
numerator = C[r,c]*invN # getting number of reviews where u and w co-occur and multiply by invN (numerator)
denominator = (sum(C[:,c])*invN) * (sum(C[r])*invN)
pmi = log10(numerator*(1/denominator))
PMI[r,c] = pmi
PMI[c,r] = pmi
【问题讨论】:
标签: python performance python-2.7 numpy