【发布时间】:2021-10-16 16:34:32
【问题描述】:
我正在玩一个 dbscan 示例,看看它是否适合我。在我的情况下,我有几个点(3-5)的集群靠近在一起,集群之间的距离相当长。我试图在下面的代码中复制这种情况。我认为使用低 epsilon 和低 min_samples,这应该可以工作,但它告诉我它只看到 1 个组(和 20 个噪声点?)。我是在错误地使用它,还是 dbscan 不适合这类问题。我选择了 dbscan 而不是 kmeans,因为我事先不知道会有多少个集群 (1-5)。
from sklearn.datasets import make_blobs
from sklearn.cluster import DBSCAN
import numpy as np
import matplotlib.pyplot as plt
# Configuration options
num_samples_total = 20
cluster_centers = [(3,3), (7,7),(7,3),(3,7),(5,5)]
num_classes = len(cluster_centers)
#epsilon = 1.0
epsilon = 1e-5
#min_samples = 13
min_samples = 2
# Generate data
X, y = make_blobs(n_samples = num_samples_total, centers = cluster_centers, n_features = num_classes, center_box=(0, 1), cluster_std = 0.05)
np.save('./clusters.npy', X)
X = np.load('./clusters.npy')
# Compute DBSCAN
db = DBSCAN(eps=epsilon, min_samples=min_samples).fit(X)
labels = db.labels_
no_clusters = len(np.unique(labels) )
no_noise = np.sum(np.array(labels) == -1, axis=0)
print('Estimated no. of clusters: %d' % no_clusters)
print('Estimated no. of noise points: %d' % no_noise)
# Generate scatter plot for training data
colors = list(map(lambda x: '#3b4cc0' if x == 1 else '#b40426', labels)) #only set for 2 colors
plt.scatter(X[:,0], X[:,1], c=colors, marker="o", picker=True)
plt.title('Two clusters with data')
plt.xlabel('Axis X[0]')
plt.ylabel('Axis X[1]')
plt.show()
【问题讨论】:
标签: python cluster-analysis dbscan