在 matlab 中创建集群答案

【问题标题】：Creating Clusters in matlab在 matlab 中创建集群
【发布时间】：2021-02-13 17:59:48
【问题描述】：

假设我在matlab中生成了一些数据如下：

n = 100;

x = randi(n,[n,1]);
y = rand(n,1);
data = [x y];

plot(x,y,'rx')
axis([0 100 0 1])

现在我想生成一个算法来将所有这些数据分类到一些集群（它们是任意的）中，这样一个点才成为集群的成员，前提是该点与至少一个成员之间的距离集群的数量小于 10。如何生成代码？

【问题讨论】：

标签： matlab cluster-analysis data-analysis

【解决方案1】：

您描述的聚类方法是DBSCAN。请注意，该算法只会在提供的数据中找到一个集群，因为数据集中不太可能存在一个点，因此它与所有其他点的距离超过 10. 如果这确实是您想要的，您可以使用ِDBSCAN，如果您使用的是早于 2019a 的版本，则可以使用 the one posted in FE。

% Generating random points, almost similar to the data provided by OP 
data = bsxfun(@times, rand(100, 2), [100 1]);
% Adding more random points
for i=1:5
    mu = rand(1, 2)*100 -50;
    A = rand(2)*5;
    sigma = A*A'+eye(2)*(1+rand*2);%[1,1.5;1.5,3];
    data = [data;mvnrnd(mu,sigma,20)];
end
% clustering using DBSCAN, with epsilon = 10, and min-points = 1 as 
idx = DBSCAN(data, 10, 1);
% plotting clusters
numCluster = max(idx);
colors = lines(numCluster);
scatter(data(:, 1), data(:, 2), 30, colors(idx, :), 'filled')
title(['No. of Clusters: ' num2str(numCluster)])
axis equal

上图中的数字表示任意两个不同簇中最近的点对之间的距离。

【讨论】：

【解决方案2】：

Matlab 内置函数clusterdata() 可以很好地满足您的要求。

以下是如何将其应用于您的示例：

% number of points
n = 100; 

% create the data
x = randi(n,[n,1]);
y = rand(n,1);
data = [x y]; 

% the number of clusters you want to create
num_clusters = 5; 

T1 = clusterdata(data,'Criterion','distance',...
'Distance','euclidean',...
'MaxClust', num_clusters)

scatter(x, y, 100, T1,'filled')

在这种情况下，我使用了 5 个聚类并使用欧几里得距离作为度量来对数据点进行分组，但您可以随时更改 (see documentation of clusterdata())

请参阅下面的 5 个包含一些随机数据的集群的结果。

请注意，数据是倾斜的（x-values 是从 0 到 100，y-values 是从 0 到 1），因此结果也是倾斜的，但您始终可以标准化您的数据。

【讨论】：

【解决方案3】：

这是一种使用图的连通分量的方法：

D = pdist2(x, y) < 10;
D(1:size(D,1)+1:end) = 0;
G = graph(D);
C = conncomp(G);

连通分量是表示簇数的向量。

使用pdist2 计算x 和y 的距离矩阵。
使用距离矩阵创建一个逻辑邻接矩阵，如果两个点之间的距离小于10，则表明它们是邻居。
将邻接矩阵的对角元素设置为0，以消除自环。
从邻接矩阵创建graph。
计算图的connected components。

请注意，对于大型数据集使用pdist2 可能不适用，您需要使用其他方法来形成稀疏邻接矩阵。

我在提出答案后通知@saastn 提供的答案建议使用几乎遵循相同方法的DBSCAN 算法。

【讨论】：