【问题标题】:Speeding up guassian EM algorithm加速高斯EM算法
【发布时间】:2016-02-15 22:15:45
【问题描述】:

我的python代码如下......它需要很长时间。一定有一些我可以使用的 numpy 技巧?我正在分析的图片很小,而且是灰度的……

def gaussian_probability(x,mean,standard_dev):
        termA = 1.0 / (standard_dev*np.sqrt(2.0*np.pi))
        termB = np.exp(-((x - mean)**2.0)/(2.0*(standard_dev**2.0)))
        g = (termA*termB)
        return g
def sum_of_gaussians(x):
        return sum([self.mixing_coefficients[i] * 
                    gaussian_probability(x, self.means[i], self.variances[i]**0.5) 
                    for i in range(self.num_components)])
def expectation():
        dim = self.image_matrix.shape
        rows, cols = dim[0], dim[1]
        responsibilities = []
        for i in range(self.num_components):
            gamma_k = np.zeros([rows, cols])
            for j in range(rows):
                for k in range(cols):
                    p = (self.mixing_coefficients[i] *      
                         gaussian_probability(self.image_matrix[j,k],
                                              self.means[i], 
                                              self.variances[i]**0.5))
                    gamma_k[j,k] = p / sum_of_gaussians(self.image_matrix[j,k])
            responsibilities.append(gamma_k)
        return responsibilities

我只包括了期望步骤,因为虽然最大化步骤循环遍历矩阵责任数组的每个元素,但它似乎走得相对较快(所以瓶颈可能是所有 gaussian_probability 计算?)

【问题讨论】:

    标签: python algorithm expectation-maximization


    【解决方案1】:

    你可以通过做两件事来大大加快你的计算速度:

    • 不要在每个循环中计算归一化!如目前所写,对于具有 M 个组件的 NxN 图像,您正在计算每个相关计算 N * N * M 次,从而导致 O[N^4 M^2] 算法!相反,您应该计算所有元素一次,然后除以总和,即O[N^2 M]

    • 使用 numpy 向量化而不是显式循环。这可以按照您设置代码的方式非常简单地完成。

    基本上,您的 expectation 函数应该如下所示:

    def expectation(self):
        responsibilities = (self.mixing_coefficients[:, None, None] *
                            gaussian_probability(self.image_matrix,
                                                 self.means[:, None, None],
                                                 self.variances[:, None, None] ** 0.5))
        return responsibilities / responsibilities.sum(0)
    

    你没有提供一个完整的例子,所以我不得不即兴创作一些来检查和基准测试,但这里是一个快速的例子:

    import numpy as np
    
    def gaussian_probability(x,mean,standard_dev):
        termA = 1.0 / (standard_dev*np.sqrt(2.0*np.pi))
        termB = np.exp(-((x - mean)**2.0)/(2.0*(standard_dev**2.0)))
        return termA * termB
    
    class EM(object):
        def __init__(self, N=5):
            self.image_matrix = np.random.rand(20, 20)
            self.num_components = N
            self.mixing_coefficients = 1 + np.random.rand(N)
            self.means = 10 * np.random.rand(N)
            self.variances = np.ones(N)
    
        def sum_of_gaussians(self, x):
            return sum([self.mixing_coefficients[i] * 
                        gaussian_probability(x, self.means[i], self.variances[i]**0.5) 
                        for i in range(self.num_components)])
    
        def expectation(self):
            dim = self.image_matrix.shape
            rows, cols = dim[0], dim[1]
            responsibilities = []
            for i in range(self.num_components):
                gamma_k = np.zeros([rows, cols])
                for j in range(rows):
                    for k in range(cols):
                        p = (self.mixing_coefficients[i] *      
                             gaussian_probability(self.image_matrix[j,k],
                                                  self.means[i], 
                                                  self.variances[i]**0.5))
                        gamma_k[j,k] = p / self.sum_of_gaussians(self.image_matrix[j,k])
                responsibilities.append(gamma_k)
            return responsibilities
    
        def expectation_fast(self):
            responsibilities = (self.mixing_coefficients[:, None, None] *
                                gaussian_probability(self.image_matrix,
                                                     self.means[:, None, None],
                                                     self.variances[:, None, None] ** 0.5))
            return responsibilities / responsibilities.sum(0)
    

    现在我们可以实例化对象并比较期望步骤的两种实现:

    em = EM(5)
    np.allclose(em.expectation(),
                em.expectation_fast())
    # True
    

    从时间上看,对于具有 5 个组件的 20x20 图像,我们的速度提高了大约 1000 倍:

    %timeit em.expectation()
    10 loops, best of 3: 65.9 ms per loop
    
    %timeit em.expectation_fast()
    10000 loops, best of 3: 74.5 µs per loop
    

    这种改进将随着图像大小和组件数量的增加而增加。 祝你好运!

    【讨论】:

    • 哈哈,又看了一遍……不敢相信我在循环中做了标准化。谢谢大佬!
    • responsibilities.sum(0) 是否对值求和?例如,每个像素都需要根据每个分量进行归一化......所以responsibility[0][x,y] = responsibility[0][x,y] / (responsibility[0][x,y] + [1][x,y] + [2][x,y], etc...)
    • 是的,这正是它所做的:它在跨越组件的第零轴上求和。
    猜你喜欢
    • 2012-10-14
    • 2015-10-24
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-02-22
    • 2015-12-26
    • 2012-06-12
    相关资源
    最近更新 更多