Matplotlib：为什么插值点落在绘制线之外？答案

【问题标题】：Matplotlib: Why does interpolated points fall outside the plotted line?Matplotlib：为什么插值点落在绘制线之外？
【发布时间】：2020-12-14 08:04:12
【问题描述】：

我使用 Matplotlib 重新创建了一个常见的地球科学图。它显示了土壤样品的粒度分布，用于土壤分类。

基本上，将土壤样本放在一堆筛子中，然后将筛子摇晃一段时间，然后将每个谷物部分的剩余重量绘制在图表上（见下图）。

此类图表的一个重要用途是确定称为 D60 和 D10 的两个参数，它们分别是通过率 60% 和 10% 时的晶粒尺寸（参见图表中的橙色点）。我已经使用np.interp 的函数对这些值进行了插值，但奇怪的是，这些点落在了 Matplotlib 绘制的线之外。谁能给我一个提示我在哪里出错了？它们应该与 y = 10 和 y = 60 的直线相交。

数据如下：

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import FormatStrFormatter

d = {
    'x': [0.063, 0.125, 0.250, 0.500, 1.000, 2.000, 4.000, 8.000],
    'y': [5.9, 26.0, 59.0, 87.0, 95.0, 97.0, 97.0, 100.0]
}

df = pd.DataFrame(d)

df
    x   y
0   0.063   5.9
1   0.125   26.0
2   0.250   59.0
3   0.500   87.0
4   1.000   95.0
5   2.000   97.0
6   4.000   97.0
7   8.000   100.0

用于插值的函数看起来像这样（我尝试了使用 Scipy 的类似方法，结果相同）：

def interpolate(xval, df, xcol, ycol):
    return np.interp([xval], df[ycol], df[xcol])

用于创建绘图本身的代码如下所示：

fig, ax = plt.subplots(figsize=(10,5))

ax.scatter(df['x'], df['y']) #Show datapoints

# Beginning of table
cell_text = [
    ['Clay','FSi','MSi','CSi', 'FSa', 'MSa', 'CSa', 'FGr', 'MGr', 'CGr', 'Co']
]

table = ax.table(
    cellText=cell_text,
    colWidths=[0.06, 0.1, 0.1, 0.1,0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.04],
    cellLoc = 'center',
    rowLoc = 'center',
    loc='top')

h = table.get_celld()[(0,0)].get_height()
w = table.get_celld()[(0,0)].get_width()

header = [table.add_cell(-1,pos, 0.1, h, loc="center", facecolor="none") for pos in [i for i in range(1,10)]]

table.auto_set_font_size(False)
table.set_fontsize(12)
table.scale(1, 1.5)

for i in [0,3,6]:
    header[i].visible_edges = 'TBL'
for i in [1,4,7]:
    header[i].visible_edges = 'TB'
for i in [2,5,8]:
    header[i].visible_edges = 'TBR'

header[1].get_text().set_text('Silt')
header[4].get_text().set_text('Sand')
header[7].get_text().set_text('Gravel')
# End of table


plt.grid(b=True, which='major', color='k', linestyle='--', alpha=0.5)
plt.grid(b=True, which='minor', color='k', linestyle='--', alpha=0.5)

ax.set_yticks(np.arange(0, 110, 10))
ax.set_xscale('log')
ax.xaxis.set_major_formatter(FormatStrFormatter('%g'))
ax.axis(xmin=0.001,xmax=100, ymin=0, ymax=100)

#Interpolate D10 and D60
x2 = np.concatenate((interpolate(10, df, 'x', 'y'), interpolate(60, df, 'x', 'y')))
y2 = np.array([10,60])

#Plot D10 and D60
ax.scatter(x2, y2)

#Plot the line
ax.plot(df['x'], df['y'])

ax.set_xlabel('Grain size (mm)'), ax.set_ylabel('Percent passing (%)')

谁能帮我弄清楚为什么橙色点会稍微落在线条之外，我做错了什么？谢谢！

【问题讨论】：

标签： python pandas numpy matplotlib

【解决方案1】：

问题是您使用线性插值来查找点，而绘图在对数刻度上具有直线。这可以通过日志空间中的插值来实现：

def interpolate(yval, df, xcol, ycol):
    return np.exp(np.interp([yval], df[ycol], np.log(df[xcol])))

如果你进一步写np.array(yval)而不是[yval]，向量x2可以被简化。提供z-order 或3 会在线条顶部绘制新点。可以选择添加一些文本：

def interpolate(yval, df, xcol, ycol):
    return np.exp(np.interp(np.array(yval), df[ycol], np.log(df[xcol])))

y2 = [10, 60]
x2 = interpolate(y2, df, 'x', 'y')
ax.scatter(x2, y2, zorder=3, color='crimson')
for x, y in zip(x2, y2):
    ax.text(x, y, f' D{y}={x:.4f}', color='crimson', ha='left', va='top', size=12)

【讨论】：

谢谢！效果很好！关于显示文本的代码：for x, y in zip(x2, y2): ax.text(x, y, f' D{y}={x:.4f}', color='crimson', ha='left', va='top', size=12) 导致以下错误：TypeError: unsupported format string passed to numpy.ndarray.__format__。无论如何，有人（无论出于何种原因它对我不起作用）遇到相同的错误，使用 .flatten() 处理错误 x2 = interpolate(y2, df, 'x', 'y').flatten() 干杯！
flatten 是当np.interp 被np.interp([yval],..) 调用而yval 已经是一个列表时删除额外列表的一种方法。这就是修改后的代码建议使用np.interp(np.array(yval),...) 的原因，这在yval 是单个数字或列表（或一维数组）的情况下都可以使用。