【发布时间】:2018-11-28 23:54:13
【问题描述】:
我正在为我的论文做一个项目,但我很伤心,因为我无法通过 Spotify API 对我的数据集进行 k-means 聚类。
artist_name track_popularity explicit artist_genres album_genres soundness danceability energy instrumentalness key liveness Loudness mode Speechness tempo time_signature valence mapped_at
我的数据集有这些变量,我必须对从声学到化合价的变量进行聚类(所以 12 个变量)。我怎样才能做到这一点? 我可以用 2 或 3 个变量来做这件事,但我不能用四个或四个以上的变量来做。
> from copy import deepcopy
import numpy as np
import matplotlib.pyplot as plot
import pandas as pd
from sklearn.cluster import KMeans
#importing Dataset
dataset = pd.read_csv('csvProva2.csv')
X = dataset.iloc[:, [10,11]].values #colonne che mi interessano
#Find the number of clusters
wcss = []
for i in range (1,16): #15 cluster
kmeans = KMeans(n_clusters = i, init='k-means++', random_state=0)
kmeans.fit(X)
wcss.append(kmeans.inertia_)
plot.plot(range(1,16),wcss)
plot.title('Elbow Method')
plot.xlabel('Number of clusters')
plot.ylabel('wcss')
plot.show()
#KMeans clustering
kmeans= KMeans(n_clusters=4,init='k-means++', random_state=0)
y=kmeans.fit_predict(X)
plot.scatter(X[y == 0,0], X[y==0,1], s=25, c='red', label='Cluster 1')
plot.scatter(X[y == 1,0], X[y==1,1], s=25, c='blue', label='Cluster 2')
plot.scatter(X[y == 2,0], X[y==2,1], s=25, c='magenta', label='Cluster 3')
plot.scatter(X[y == 3,0], X[y==3,1], s=25, c='cyan', label='Cluster 4')
plot.scatter(kmeans.cluster_centers_[:,0], kmeans.cluster_centers_[:,1], s=25, c='yellow', label='Centroid')
plot.title('KMeans Clustering')
plot.xlabel('Acousticness')
plot.ylabel('Danceability')
plot.legend()
plot.show()
这是我使用 2 个变量进行聚类的代码。
【问题讨论】:
-
我在这里解决了:github.com/joaocarvalhoopen/…
标签: python scikit-learn cluster-analysis k-means