【问题标题】:DBSCAN returns partial clustersDBSCAN 返回部分集群
【发布时间】:2015-08-15 04:42:08
【问题描述】:

我正在尝试在此处实现 DBSCAN 的代码:http://en.wikipedia.org/wiki/DBSCAN

我感到困惑的部分是

expandCluster(P, NeighborPts, C, eps, MinPts) add P to cluster C for each point P' in NeighborPts if P' is not visited mark P' as visited NeighborPts' = regionQuery(P', eps) if sizeof(NeighborPts') >= MinPts NeighborPts = NeighborPts joined with NeighborPts' if P' is not yet member of any cluster add P' to cluster C

我的代码如下。照原样,它当前返回部分集群,其中一个点应该是密度连接的,即使它不在直接的 eps 邻域中。我的代码只返回每个点的前几个邻居。

import numpy 
import time 
from math import radians, cos, sin, asin, sqrt
import re, math


def haversine(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees) returned as kilometers 
    """
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    km = 6367 * c
    return km



def ST_DBSCAN(points,max_distance,MinPts):
    global visited
    visited = []
    noise = []
    cluster_id = 0
    clusters = []
    in_cluster = []
    for p in points: 
        if p not in visited:
            # neighbor_points = []
            visited.append(p)
            NeighborPts = regionQuery(p,points,max_distance)
            if len(NeighborPts) < MinPts:
                noise.append(p)
            else:
                cluster_id = cluster_id + 1
                g = expandCluster(p,NeighborPts,max_distance,MinPts,in_cluster)
                clusters.append(g)
    return clusters

#return len(NeighborPts)

def expandCluster(p,NeighborPts,max_distance,MinPts,in_cluster):
    in_cluster.append(p[0])
    cluster = []
    cluster.append(p[0])
    for point in NeighborPts:
        if point not in visited:
            visited.append(point)
            new_neighbors = regionQuery(point,points,max_distance)
            if len(new_neighbors) >= MinPts: 
                new_neighbors.append(NeighborPts)
            if point[0] not in in_cluster:
                 in_cluster.append(point[0])
                 cluster.append(point[0])             
    return  cluster




def regionQuery(p,points,max_distance):
    neighbor_points = []
    for j in points:
        if j != p:
           # print 'P is %s and j is %s' % (p[0],j[0])
            dist = haversine(p[1],p[2],j[1],j[2])
            if dist <= max_distance:
                neighbor_points.append(j)
    neighbor_points.append(p) 
    return neighbor_points   

我在下面有一个子集。点 1 和 5 应该相距 10.76 公里,因此它们不应该在初始查询中,但它们应该包含在同一个集群中,因为点 5 通过点 3 密度连接。

pointList = [[1,36.4686,2.8289], 
[2,36.4706,2.8589], 
[3,36.4726,2.8889],
[4,36.4746,2.9189],
[5,36.4766,2.9489], 
[6,36.4786,2.9789],
[7,36.4806,3.0089], 
[8,36.4826,3.0389], 
[9,36.4846,3.0689], 
[10,36.4866,3.0989]]

points= pointList

g = ST_DBSCAN(points,10,3)

【问题讨论】:

  • 这不能回答你的问题,但如果你想要的只是一个有效的 DBSCAN 实现,scikit-learn 有一个很好的实现
  • @oxymor0n 感谢您的评论。我正在尝试实现自己的函数,以提高我对其工作原理的理解,并为远程呼叫提供一些灵活性(最终,我想添加更多维度)。
  • 我觉得scikit版本不太好如果要修改距离函数。它对欧几里得距离过于优化。

标签: python cluster-analysis dbscan


【解决方案1】:

您的expandCluster 函数忘记了新邻居。

您的设置更新已交换。

【讨论】:

  • 谢谢,我可以通过在 new_neighbors 中的每个点通过 pop() 切换 set update 并将它们附加到 NeighborPts 来修复错误。
猜你喜欢
  • 1970-01-01
  • 2021-11-05
  • 2021-07-07
  • 2018-09-01
  • 2018-11-22
  • 2013-01-15
  • 2018-08-29
  • 2020-03-04
  • 2017-05-20
相关资源
最近更新 更多