在上面的 cmets 中猜测 Chris,我假设您想根据高斯混合模型对点进行聚类 - 一种合理的方法,假设基础分布是高斯分布样本的线性组合。下面我展示了一个使用numpy 创建样本数据集的示例,sklearn 用于它的 GM 建模,pylab 用于显示结果。
import numpy as np
from pylab import *
from sklearn import mixture
# Create some sample data
def G(mu, cov, pts):
return np.random.multivariate_normal(mu,cov,500)
# Three multivariate Gaussians with means and cov listed below
MU = [[5,3], [0,0], [-2,3]]
COV = [[[4,2],[0,1]], [[1,0],[0,1]], [[1,2],[2,1]]]
A = [G(mu,cov,500) for mu,cov in zip(MU,COV)]
PTS = np.concatenate(A) # Join them together
# Use a Gaussian Mixture model to fit
g = mixture.GMM(n_components=len(A))
g.fit(PTS)
# Returns an index list of which cluster they belong to
C = g.predict(PTS)
# Plot the original points
X,Y = map(array, zip(*PTS))
subplot(211)
scatter(X,Y)
# Plot the points and color according to the cluster
subplot(212)
color_mask = ['k','b','g']
for n in xrange(len(A)):
idx = (C==n)
scatter(X[idx],Y[idx],color=color_mask[n])
show()
有关分类方法的更多详细信息,请参阅sklearn.mixture example 页面。