matplotlib 密集数据集的主要显示问题答案

【问题标题】：matplotlib major display issue with dense data setsmatplotlib 密集数据集的主要显示问题
【发布时间】：2013-04-03 18:57:07
【问题描述】：

我在使用 matplotlib 和 Python 时遇到了一个相当严重的问题。我有一个密集的周期图数据集并想要绘制它。问题是，当数据点多于可以在像素上绘制的数据时，程序包不会选择要显示的最小值和最大值。这意味着随便看一下情节可能会导致您得出错误的结论。

以下是此类问题的示例：

数据集绘制时覆盖了plot() 和scatter()。您可以看到，在密集的数据字段中，连接数据的蓝线未达到实际峰值，导致人类观察者得出结论，约 2.4 处的峰值是最大值，而实际上并非如此。

如果您放大或强制宽查看窗口，它会正确显示。 rasterize 和 aa 关键字对问题没有影响。

有没有办法确保始终呈现plot() 调用的最小/最大点？否则，这需要在 matplotlib 的更新中解决。我从未有过这样的绘图包，这是一个相当大的问题。

编辑：

x = numpy.linspace(0,1,2000000)
y = numpy.random.random(x.shape)
y[1000000]=2

plot(x,y)
show()

应该复制问题。虽然它可能取决于您的显示器分辨率。通过拖动并调整窗口大小，您应该会看到问题所在。一个数据点应该突出 y=2，但这并不总是显示出来。

【问题讨论】：

您使用的是什么版本的 MPL？如果它是最新的，您应该在 github 跟踪器上创建一个问题（这将确保得到核心开发人员的关注）。您能否发布一个示例数据集 + 用于生成该图的代码？它使测试变得更加容易。
如果你使用plot(..., marker='.', linestyle='-')，它会正确达到最小值/最大值吗？
@tcaswell 添加了代码。标记和线条样式的更改没有帮助。谢谢。
我无法复制它...你的后端是什么？ matplotlib.get_backend()
如果我完全按照发布的方式运行代码，我会从渲染器中得到OverflowErrors。通过将所有数字减少 10，我可以让它运行，但总能看到峰值。 matplotlib.__version__ 给了什么？

标签： python matplotlib

【解决方案1】：

这是由于 matplotlib 中的路径简化算法。虽然在某些情况下它肯定是不可取的，但它是加速渲染的故意行为。

为了避免跳过“异常”点，简化算法在某些时候进行了更改，因此新版本的 mpl 不会表现出这种确切的行为（尽管路径仍然是简化的）。

如果您不想简化路径，则可以在 rc 参数中禁用它（在您的 .matplotlibrc 文件中或在运行时）。

例如

import matplotlib as mpl
mpl.rcParams['path.simplify'] = False
import matplotlib.pyplot as plt

但是，使用“信封”样式的绘图可能更有意义。举个简单的例子：

import matplotlib.pyplot as plt
import numpy as np

def main():
    num = 10000
    x = np.linspace(0, 10, num)
    y = np.cos(x) + 5 * np.random.random(num)

    fig, (ax1, ax2) = plt.subplots(nrows=2)
    ax1.plot(x, y)
    envelope_plot(x, y, winsize=40, ax=ax2)
    plt.show()

def envelope_plot(x, y, winsize, ax=None, fill='gray', color='blue'):
    if ax is None:
        ax = plt.gca()
    # Coarsely chunk the data, discarding the last window if it's not evenly
    # divisible. (Fast and memory-efficient)
    numwin = x.size // winsize
    ywin = y[:winsize * numwin].reshape(-1, winsize)
    xwin = x[:winsize * numwin].reshape(-1, winsize)
    # Find the min, max, and mean within each window 
    ymin = ywin.min(axis=1)
    ymax = ywin.max(axis=1)
    ymean = ywin.mean(axis=1)
    xmean = xwin.mean(axis=1)

    fill_artist = ax.fill_between(xmean, ymin, ymax, color=fill, 
                                  edgecolor='none', alpha=0.5)
    line, = ax.plot(xmean, ymean, color=color, linestyle='-')
    return fill_artist, line

if __name__ == '__main__':
    main()

【讨论】：