【问题标题】:dbscan not making sense for small amounts of pointsdbscan 对少量点没有意义
【发布时间】:2021-10-16 16:34:32
【问题描述】:

我正在玩一个 dbscan 示例,看看它是否适合我。在我的情况下,我有几个点(3-5)的集群靠近在一起,集群之间的距离相当长。我试图在下面的代码中复制这种情况。我认为使用低 epsilon 和低 min_samples,这应该可以工作,但它告诉我它只看到 1 个组(和 20 个噪声点?)。我是在错误地使用它,还是 dbscan 不适合这类问题。我选择了 dbscan 而不是 kmeans,因为我事先不知道会有多少个集群 (1-5)。

from sklearn.datasets import make_blobs
from sklearn.cluster import DBSCAN
import numpy as np
import matplotlib.pyplot as plt

# Configuration options
num_samples_total = 20
cluster_centers = [(3,3), (7,7),(7,3),(3,7),(5,5)]
num_classes = len(cluster_centers)
#epsilon = 1.0
epsilon = 1e-5
#min_samples = 13
min_samples = 2

# Generate data
X, y = make_blobs(n_samples = num_samples_total, centers = cluster_centers, n_features = num_classes, center_box=(0, 1), cluster_std = 0.05)

np.save('./clusters.npy', X)
X = np.load('./clusters.npy')

# Compute DBSCAN
db = DBSCAN(eps=epsilon, min_samples=min_samples).fit(X)
labels = db.labels_

no_clusters = len(np.unique(labels) )
no_noise = np.sum(np.array(labels) == -1, axis=0)

print('Estimated no. of clusters: %d' % no_clusters)
print('Estimated no. of noise points: %d' % no_noise)

# Generate scatter plot for training data
colors = list(map(lambda x: '#3b4cc0' if x == 1 else '#b40426', labels))                #only set for 2 colors
plt.scatter(X[:,0], X[:,1], c=colors, marker="o", picker=True)
plt.title('Two clusters with data')
plt.xlabel('Axis X[0]')
plt.ylabel('Axis X[1]')
plt.show()

【问题讨论】:

    标签: python cluster-analysis dbscan


    【解决方案1】:

    最终选择了 kmeans 并进行了改进的肘部方法:

    print(__doc__)
    
    # Author: Phil Roth <mr.phil.roth@gmail.com>
    # License: BSD 3 clause
    
    import numpy as np
    import matplotlib.pyplot as plt
    
    from sklearn.cluster import KMeans
    from sklearn.datasets import make_blobs
    
    # Configuration options
    num_samples_total = 20
    cluster_centers = [(3,3), (7,7),(7,3),(3,7),(5,5)]
    num_classes = len(cluster_centers)
    #epsilon = 1.0
    epsilon = 1e-5
    #min_samples = 13
    min_samples = 2
    
    # Generate data
    X, y = make_blobs(n_samples = num_samples_total, centers = cluster_centers, n_features = num_classes, center_box=(0, 1), cluster_std = 0.05)
    random_state = 170
    
    #y_pred = KMeans(n_clusters=5, random_state=random_state).fit_predict(X)
    #plt.scatter(X[:, 0], X[:, 1], c=y_pred)
    #kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
    #maybe I dont have to look for an elbow, just go until the value drops below 1.
    #also if I do go too far, it just means that the same shape will be shown twice.
    clusterIdx = 0
    inertia = 100
    while inertia > 1:
        clusterIdx = clusterIdx + 1
        kmeans = KMeans(n_clusters=clusterIdx, random_state=0).fit(X)
        inertia = kmeans.inertia_
        print(inertia)
    plt.scatter(X[:, 0], X[:, 1], c=kmeans.labels_)
    print(clusterIdx)
    plt.show()
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2020-01-17
      • 2014-01-18
      • 2013-08-12
      • 1970-01-01
      • 2015-08-12
      • 1970-01-01
      • 2020-02-15
      • 1970-01-01
      相关资源
      最近更新 更多