【问题标题】:Can I use results of MCA for clustering using K-means, DBScan or GMM?我可以使用 MCA 的结果使用 K-means、DBScan 或 GMM 进行聚类吗?
【发布时间】:2020-10-07 17:07:51
【问题描述】:

我正在解决一个问题,我将所有变量都作为分类变量并应用了 MCA。当我将 MCA 结果与通过 K 模式(独立于 MCA 应用)获得的集群结合起来可视化时,集群相互重叠。我想知道我应该简单地获取 MCA 组件并在这些组件上应用 K-means 或其他聚类算法,而不是应用 k 模式。这有意义吗?

【问题讨论】:

    标签: machine-learning statistics artificial-intelligence cluster-analysis


    【解决方案1】:

    我认为 K-Means 不允许重叠。样本结果分配给最近的集群,但不是全部,因此没有重叠。查看下面的代码示例。

    import numpy as np
    import matplotlib.pyplot as plt
    from scipy.spatial import Voronoi
    
    def voronoi_finite_polygons_2d(vor, radius=None):
        """
        Reconstruct infinite voronoi regions in a 2D diagram to finite
        regions.
    
        Parameters
        ----------
        vor : Voronoi
            Input diagram
        radius : float, optional
            Distance to 'points at infinity'.
    
        Returns
        -------
        regions : list of tuples
            Indices of vertices in each revised Voronoi regions.
        vertices : list of tuples
            Coordinates for revised Voronoi vertices. Same as coordinates
            of input vertices, with 'points at infinity' appended to the
            end.
    
        """
    
        if vor.points.shape[1] != 2:
            raise ValueError("Requires 2D input")
    
        new_regions = []
        new_vertices = vor.vertices.tolist()
    
        center = vor.points.mean(axis=0)
        if radius is None:
            radius = vor.points.ptp().max()*2
    
        # Construct a map containing all ridges for a given point
        all_ridges = {}
        for (p1, p2), (v1, v2) in zip(vor.ridge_points, vor.ridge_vertices):
            all_ridges.setdefault(p1, []).append((p2, v1, v2))
            all_ridges.setdefault(p2, []).append((p1, v1, v2))
    
        # Reconstruct infinite regions
        for p1, region in enumerate(vor.point_region):
            vertices = vor.regions[region]
    
            if all([v >= 0 for v in vertices]):
                # finite region
                new_regions.append(vertices)
                continue
    
            # reconstruct a non-finite region
            ridges = all_ridges[p1]
            new_region = [v for v in vertices if v >= 0]
    
            for p2, v1, v2 in ridges:
                if v2 < 0:
                    v1, v2 = v2, v1
                if v1 >= 0:
                    # finite ridge: already in the region
                    continue
    
                # Compute the missing endpoint of an infinite ridge
    
                t = vor.points[p2] - vor.points[p1] # tangent
                t /= np.linalg.norm(t)
                n = np.array([-t[1], t[0]])  # normal
    
                midpoint = vor.points[[p1, p2]].mean(axis=0)
                direction = np.sign(np.dot(midpoint - center, n)) * n
                far_point = vor.vertices[v2] + direction * radius
    
                new_region.append(len(new_vertices))
                new_vertices.append(far_point.tolist())
    
            # sort region counterclockwise
            vs = np.asarray([new_vertices[v] for v in new_region])
            c = vs.mean(axis=0)
            angles = np.arctan2(vs[:,1] - c[1], vs[:,0] - c[0])
            new_region = np.array(new_region)[np.argsort(angles)]
    
            # finish
            new_regions.append(new_region.tolist())
    
        return new_regions, np.asarray(new_vertices)
    
    # make up data points
    np.random.seed(1234)
    points = np.random.rand(15, 2)
    
    # compute Voronoi tesselation
    vor = Voronoi(points)
    
    # plot
    regions, vertices = voronoi_finite_polygons_2d(vor)
    print("--")
    print(regions)
    print("--")
    print(vertices)
    
    # colorize
    for region in regions:
        polygon = vertices[region]
        plt.fill(*zip(*polygon), alpha=0.4)
    
    plt.plot(points[:,0], points[:,1], 'ko')
    plt.axis('equal')
    plt.xlim(vor.min_bound[0] - 0.1, vor.max_bound[0] + 0.1)
    plt.ylim(vor.min_bound[1] - 0.1, vor.max_bound[1] + 0.1)
    

    我认为一些聚类算法实际上确实允许重叠。使用 Google 搜索,您会找到所需的内容。

    希望对您有所帮助。

    【讨论】:

      猜你喜欢
      • 2017-12-31
      • 2016-07-28
      • 2015-01-16
      • 2014-04-13
      • 2017-05-24
      • 2012-03-24
      • 2017-11-01
      • 2013-11-23
      • 1970-01-01
      相关资源
      最近更新 更多