【发布时间】:2019-07-17 15:41:30
【问题描述】:
我有一个用于 k-means 算法的 python 代码。
我很难理解它的作用。
像C = X[numpy.random.choice(X.shape[0], k, replace=False), :] 这样的行让我很困惑。
有人能解释一下这段代码实际上在做什么吗? 谢谢
def k_means(data, k, num_of_features):
# Make a matrix out of the data
X = data.as_matrix()
# Get k random points from the data
C = X[numpy.random.choice(X.shape[0], k, replace=False), :]
# Remove the last col
C = [C[j][:-1] for j in range(len(C))]
# Turn it into a numpy array
C = numpy.asarray(C)
# To store the value of centroids when it updates
C_old = numpy.zeros(C.shape)
# Make an array that will assign clusters to each point
clusters = numpy.zeros(len(X))
# Error func. - Distance between new centroids and old centroids
error = dist(C, C_old, None)
# Loop will run till the error becomes zero of 5 tries
tries = 0
while error != 0 and tries < 1:
# Assigning each value to its closest cluster
for i in range(len(X)):
# Get closest cluster in terms of distance
clusters[i] = dist1(X[i][:-1], C)
# Storing the old centroid values
C_old = deepcopy(C)
# Finding the new centroids by taking the average value
for i in range(k):
# Get all of the points that match the cluster you are on
points = [X[j][:-1] for j in range(len(X)) if clusters[j] == i]
# If there were no points assigned to cluster, put at origin
if not points:
C[i][:] = numpy.zeros(C[i].shape)
else:
# Get the average of all the points and put that centroid there
C[i] = numpy.mean(points, axis=0)
# Erro is the distance between where the centroids use to be and where they are now
error = dist(C, C_old, None)
# Increase tries
tries += 1
return sil_coefficient(X,clusters,k)
【问题讨论】:
-
您是在问 k-means 的一般工作原理,还是只是这个函数?
-
这个功能特别。我想我对实际算法本身有很好的理解,但我们应该从这个例子开始,这让我很困惑。所有的数据处理对我来说真的很陌生。我正在尝试查看文档,但它没有帮助,所以为了节省我几个小时的时间,我真的祈祷有人可以提供代码的演练解释。
-
啊。不幸的是,我对 Python 有点生疏。我希望你需要一个通用的 K-means 解释。祝你好运。
标签: python numpy machine-learning k-means