【发布时间】:2021-09-14 20:03:16
【问题描述】:
我正在尝试在这样的数据框中进行 K-means 分析:
URBAN AREA PROVINCE DENSITY
0 1 TRUJILLO 0.30
1 2 TRUJILLO 0.03
2 3 TRUJILLO 0.80
3 1 LIMA 1.20
4 2 LIMA 0.04
5 1 LAMBAYEQUE 0.90
6 2 LAMBAYEQUE 0.10
7 3 LAMBAYEQUE 0.08
(可以从here下载)
如您所见,df 指的是省内不同的城市区域(具有不同的城市密度值)。所以,我想通过一列进行 K-means 分类:DENSITY。为此,我执行以下代码:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
df=pd.read_csv('C:/Path/to/example.csv')
clustering=KMeans(n_clusters=2, max_iter=300)
clustering.fit(df[['DENSITY']])
df['KMeans_Clusters']=clustering.labels_
df
我得到了这个结果,这对于示例的第一部分来说是可以的:
URBAN AREA PROVINCE DENSITY KMeans_Clusters
0 1 TRUJILLO 0.30 0
1 2 TRUJILLO 0.03 0
2 3 TRUJILLO 0.80 1
3 1 LIMA 1.20 1
4 2 LIMA 0.04 0
5 1 LAMBAYEQUE 0.90 1
6 2 LAMBAYEQUE 0.10 0
7 3 LAMBAYEQUE 0.08 0
但现在我想按省份对城市地区进行 k-means 分类。我的意思是,在任何省份重复同样的过程。所以我试过这段代码:
df=pd.read_csv('C:/Users/rojas/Desktop/example.csv')
clustering=KMeans(n_clusters=2, max_iter=300)
clustering.fit(df[['DENSITY']]).groupby('PROVINCE')
df['KMeans_Clusters']=clustering.labels_
df
但我收到此消息:
AttributeError Traceback (most recent call last)
<ipython-input-4-87e7696ff61a> in <module>
3 clustering=KMeans(n_clusters=2, max_iter=300)
4
----> 5 clustering.fit(df[['DENSITY']]).groupby('PROVINCE')
6
7 df['KMeans_Clusters']=clustering.labels_
AttributeError: 'KMeans' object has no attribute 'groupby'
有办法吗?
【问题讨论】:
-
改用
clustering.fit(df.groupby('PROVINCE')['DENSITY'])。 -
@DavidLee 我明白了:
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (3, 2) + inhomogeneous part.
标签: python pandas k-means sklearn-pandas