如何正确可视化劳埃德算法答案

【问题标题】：How to properly visualize lloyds algorithm如何正确可视化劳埃德算法
【发布时间】：2021-03-26 13:18:01
【问题描述】：

对于一个计算机科学项目，我必须实施 Lloyds-Algorithm，这似乎工作得很好。我想可视化迭代。这也有点用了：

import numpy 
import matplotlib.pyplot as plt

# Variables to test
centroids = [[[2, 3],[6, 7]], 
             [[1, 2],[7, 8]]]
nearest_centroid_of_samples = [[0, 0, 1, 1, 0],
                               [0, 1, 1 , 1, 0]]
quant_error = [2.123,1.789]

# Actual code
i=0
for c,ncos, qe in zip(centroids, nearest_centroid_of_samples, quant_error):
    # My x and y values
    xx = [0, 4, 3, 8, 2]
    yy = [1, 3, 9, 5, 2]
    title = "Iteration Nr.%d" % (i)
    plt.title(title)
    # A text I would like to appear
    text = "Quantisierungsfehler: %f" % (qe)
    plt.text(12.5, 3.5, text)
    # Adding my clusters to the plot, ncos encodes the color
    plt.scatter(xx, yy, c=ncos, marker='o', alpha=1)
    # Here I'm adding my centroids (the middle of each cluster) to the plot
    for pos in c:
        plt.scatter(pos[0], pos[1], c="red", marker="+")
    i = i+1
    plt.pause(0.25)
plt.show()

这基本上已经给了我想要的东西。只有一个小问题：似乎每次迭代都在前一个迭代之上。这对我的数据来说没问题，因为它们完美匹配，你看不到这一点。但是质心的红色标记有点出问题了，并且保持一点可见 - 更糟糕的是，我要添加的文本有一个更长的十进制数字，这变得不可读。

我需要如何绘制这个，它绘制了所有新的东西，但仍然在同一个图中？

最好的问候

杜甫

根据建议编辑了一些具有代表性的值。这些不代表劳埃德算法的结果，但仍应显示我的问题出在哪里。

【问题讨论】：

标签： python numpy matplotlib k-means

【解决方案1】：

我会使用 matplotlib 的面向对象接口，您可以在其中直接作用于对象：

import numpy 
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
xx = samples[:, 0]
yy = samples[:, 1]
dots = ax.scatter(xx, yy, c='k', marker='o', alpha=1)
text = ax.text(12.5, 3.5, "placeholder")
for i, (c, ncos, qe) in enumerate(zip(centroids, nearest_centroid_of_samples, quant_error)):
    ax.set_title("Iteration Nr.%d" % (i))
    text.set_text("Quantisierungsfehler: %f" % (qe))
    dots.set_color(ncos)
    pos = numpy.array([[p[0], p[1]] for p in c])
    ax.scatter(pos[:, 0], pos[:, 1], c="red", marker="+")

    plt.pause(0.25)

plt.show()

【讨论】：

我想我确实喜欢你的面向对象的方式，因为它似乎更清楚实际发生了什么以及为什么会发生。虽然我似乎对dots.set_color（ncos）有些麻烦。我想我没有正确解释：ncos 基本上只是一个与样本长度相同的数组，它具有整数值，描述它们所在的集群。由于 set_color 似乎想要某种 RGB 值，所以它是'不满足于一个整数。
@doofesohr 您应该硬编码所有变量的代表值。更多信息在这里：stackoverflow.com/help/minimal-reproducible-example
谢谢保罗，下次一定会这样做（并且可能会在上面编辑一个）。我猜会解决关于 ncos 的误解。

【解决方案2】：

在您的代码中进行以下更改-

# to control the number of decimal places in the quantisation error
# change %f ----> %.2f for 2 decimal places or how many ever you want
text = "Quantisierungsfehler: %.2f" % (qe)

# to clear the contents of a figure and plot freshly add plt.cla() after plt.pause(0.25) i.e

plt.pause(0.25)
plt.cla()

【讨论】：

谢谢。那成功了。我什至不需要格式化我的十进制数，因为它与实际绘图是相同的问题。新号码只是写在旧号码之上。
不客气，请采纳 Paul 的建议并尝试使用 matplotlib 的面向对象接口，它需要更多的输入，但这是一个很好的编码习惯，需要习惯。干杯！
现在，我实际上正在尝试实现他的版本。这绝对比我提出的更容易理解。