等群聚类算法答案

【问题标题】：Equal Group Clustering Algorithm等群聚类算法
【发布时间】：2020-12-21 10:49:09
【问题描述】：

我有 300 个收集点，我需要根据 GEO COORDINATE 对其进行聚类。但是我所有的集群都应该有一个上限为 8 下限为 5。我如何在 Python 中做到这一点。

【问题讨论】：

请分享所需的输出并解释您想要什么。
我想是这样输出，纬度经度路由代码18.2521536 76.4982399 Cluster_01 18.2526484 76.4976308 Cluster_01 18.2526006 76.4972857 Cluster_01 18.2533365 76.4975484 Cluster_01 18.2535941 76.4987773 Cluster_01 18.2535462 76.4986933 Cluster_01 18.2503783 76.5116291 Cluster_02 18.2512383 76.5085317 Cluster_02 18.2506268 76.5082113 Cluster_02 18.2516204 76.5064285 Cluster_02我有300 个这样的坐标，必须以 8 分钟的 6 的最大集群大小进行聚类

标签： python cluster-computing sklearn-pandas

【解决方案1】：

My question 回答您的问题。您需要将position 更改为GEO COORDINATE 数据，并将x,y 更改为Latitude Longitude。

dfcluster = DataFrame(position, columns=['x', 'y'])
kmeans = KMeans(n_clusters=4).fit(dfcluster)
centroids = kmeans.cluster_centers_
#for plot
# plt.scatter(dfcluster['x'], dfcluster['y'], c=kmeans.labels_.astype(float), s=50, alpha=0.5)
# plt.scatter(centroids[:, 0], centroids[:, 1], c='red', s=50)
# plt.show()
dfcluster['cluster'] = kmeans.labels_
dfcluster=dfcluster.drop_duplicates(['x', 'y'], keep='last')
dfcluster = dfcluster.sort_values(['cluster', 'x', 'y'], ascending=True)

n=8
dfcluster1=dfcluster.head(n)
n=5
dfcluster2=dfcluster.tail(n)

另外，对于平等的群体使用，Size Constrained Clustering solver

以pip install size-constrained-clustering 或pip install git+https://github.com/jingw2/size_constrained_clustering.git 开头，您可以使用minmax flow 或Heuristics

n_samples = 2000
n_clusters = 3
X = np.random.rand(n_samples, 2)

model = equal.SameSizeKMeansMinCostFlow(n_clusters)

#model = equal.SameSizeKMeansHeuristics(n_clusters)
model.fit(X)
centers = model.cluster_centers_
labels = model.labels_

【讨论】：