Python Scipy Curvefit 到线性二次曲线答案

【问题标题】：Python Scipy Curvefit to Linear Quadratic CurvePython Scipy Curvefit 到线性二次曲线
【发布时间】：2020-04-20 00:10:50
【问题描述】：

我正在尝试将线性二次模型曲线拟合到实验数据。 Y 轴值从 1 减少到 10^-5。当我使用以下代码时，生成的曲线通常似乎不适合较高 X 值的数据。我怀疑由于高 X 值处的 Y 值非常小，因此实验值和模型值之间的差异很小。但我希望模型曲线尽可能接近较高的 X 值点（即使这意味着低值没有很好地拟合）。除了使用标准偏差（我没有）之外，我还没有发现任何关于 scipy.optimize.curve_fit 加权的信息。如何提高模型在高 X 值下的拟合度？

from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
def lq(x, a, b):
    #y(x) = exp[-(ax+bx²)]
    y = []
    for i in x:
        x2=i**2
        ax = a*i
        bx2 = b*x2
        y.append(np.exp(-(ax+bx2)))
    return y
#x and y are from experiment
x=[0,1.778,2.921,3.302,6.317,9.524,10.54]
y=[1,0.831763771,0.598411595,0.656145266,0.207014135,0.016218101,0.004102041]
(a,b), pcov = curve_fit(lq, x, y, p0=[0.05,0.05])
#make the model curve using a and b
xmodel = list(range(0,20))
ymodel = lq(xmodel, a, b)
fig, ax1 = plt.subplots()
ax1.set_yscale('log')
ax1.plot(x,y, "ro", label="Experiment")  
ax1.plot(xmodel,ymodel, "r--", label="Model")  
plt.show()

【问题讨论】：

标签： python-3.x scipy curve-fitting

【解决方案1】：

我同意您的评估，即对于 y 的小值，拟合对小不匹配不是很敏感。由于您正在绘制数据并适合半对数图，我认为您真正想要的也是适合对数空间。也就是说，您可以将 log(y) 拟合到二次函数。顺便说一句（但如果您要使用 Python 进行数值工作，这很重要），您不应该循环列表，而应该使用 numpy 数组：这将使一切变得更快、更简单。通过这些更改，您的脚本可能看起来像

import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt

def lq(x, a, b):
    return -(a*x+b*x*x)

x = np.array([0,1.778,2.921,3.302,6.317,9.524,10.54])
y = np.array([1,0.831763771,0.598411595,0.656145266,0.207014135,0.016218101,0.004102041])

(a,b), pcov = curve_fit(lq, x, np.log(y), p0=[0.05,0.05])

xmodel = np.arange(20)             # Note: use numpy!
ymodel = np.exp(lq(xmodel, a, b))  # Note: take exp() as inverse log()
fig, ax1 = plt.subplots()
ax1.set_yscale('log')
ax1.plot(x, y, "ro", label="Experiment")
ax1.plot(xmodel,ymodel, "r--", label="Model")
plt.show()

请注意，模型函数已更改为您最初想要编写的ax+bx^2，现在适合np.log(y)，而不是y。这将在较小的 y 值下提供更令人满意的拟合。

您可能还会发现 lmfit (https://lmfit.github.io/lmfit-py/) 有助于解决此问题（免责声明：我是主要作者）。有了这个，你的 fit 脚本可以变成

from lmfit import Model
model = Model(lq)
params = model.make_params(a=0.05, b=0.05)
result = model.fit(np.log(y), params, x=x)

print(result.fit_report())

xmodel = np.arange(20)
ymodel = np.exp(result.eval(x=xmodel))

plt.plot(x, y, "ro", label="Experiment")
plt.plot(xmodel, ymodel, "r--", label="Model")
plt.yscale('log')
plt.legend()
plt.show()

这将打印出一份报告，其中包括拟合统计数据和可解释的不确定性以及变量之间的相关性：

[[Model]]
    Model(lq)
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 7
    # data points      = 7
    # variables        = 2
    chi-square         = 0.16149397
    reduced chi-square = 0.03229879
    Akaike info crit   = -22.3843833
    Bayesian info crit = -22.4925630
[[Variables]]
    a: -0.05212688 +/- 0.04406602 (84.54%) (init = 0.05)
    b:  0.05274458 +/- 0.00479056 (9.08%) (init = 0.05)
[[Correlations]] (unreported correlations are < 0.100)
    C(a, b) = -0.968

并给出一个图

请注意，lmfit 参数可以是固定的或有界的，并且 lmfit 带有许多内置模型。

最后，如果要在二次模型中包含一个常数项，则实际上不需要迭代方法，但可以使用多项式回归，就像 numpy.polyfit 一样。

【讨论】：

我喜欢您的 LMFit 如此轻松地提供统计数据。您的 curve_fit 解决方案 - 取函数的 exp - 似乎我应该尝试过，但我认为 curve_fit 的工作方式与它的实际工作方式不同，所以我什至没有尝试:(我的错......

【解决方案2】：

这是一个图形 Python 拟合器，它使用您的数据和 Gompertz 类型的 sigmoidal 方程。此代码使用 scipy 的差分进化遗传算法模块来确定 scipy 的非线性 curve_fit() 例程的初始参数估计。该 scipy 模块使用拉丁超立方体算法来确保对参数空间的彻底搜索，需要搜索范围。在此示例中，我将所有参数搜索范围设置为从 -2.0 到 2.0，这在这种情况下似乎有效。请注意，为初始参数估计提供范围比提供具体值要容易得多，而且这些参数范围可能很大。

import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings

#x and y are from experiment
x=[0,1.778,2.921,3.302,6.317,9.524,10.54]
y=[1,0.831763771,0.598411595,0.656145266,0.207014135,0.016218101,0.004102041]

# alias data to match previous example code
xData = numpy.array(x, dtype=float)
yData = numpy.array(y, dtype=float)


def func(x, a, b, c): # Sigmoidal Gompertz C from zunzun.com
    return a * numpy.exp(b * numpy.exp(c*x))


# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
    warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
    val = func(xData, *parameterTuple)
    return numpy.sum((yData - val) ** 2.0)


def generate_Initial_Parameters():
    parameterBounds = []
    parameterBounds.append([-2.0, 2.0]) # search bounds for a
    parameterBounds.append([-2.0, 2.0]) # search bounds for b
    parameterBounds.append([-2.0, 2.0]) # search bounds for c

    # "seed" the numpy random number generator for repeatable results
    result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
    return result.x

# by default, differential_evolution completes by calling curve_fit() using parameter bounds
geneticParameters = generate_Initial_Parameters()

# now call curve_fit without passing bounds from the genetic algorithm,
# just in case the best fit parameters are aoutside those bounds
fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)
print('Fitted parameters:', fittedParameters)
print()

modelPredictions = func(xData, *fittedParameters) 

absError = modelPredictions - yData

SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))

print()
print('RMSE:', RMSE)
print('R-squared:', Rsquared)

print()


##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
    axes = f.add_subplot(111)

    # plot wuth log Y axis scaling
    plt.yscale('log')

    # first the raw data as a scatter plot
    axes.plot(xData, yData,  'D')

    # create data for the fitted equation plot
    xModel = numpy.linspace(min(xData), max(xData))
    yModel = func(xModel, *fittedParameters)

    # now the model as a line plot
    axes.plot(xModel, yModel)

    axes.set_xlabel('X Data') # X axis data label
    axes.set_ylabel('Y Data') # Y axis data label

    plt.show()
    plt.close('all') # clean up after using pyplot

graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)

【讨论】：

谢谢你，但它需要是正在拟合的线性二次模型。我可以手动绘制 a 和 b 的值以使数据适合，所以我知道解决方案存在，但我需要一种编程方式来做到这一点。我已将问题编辑得更清楚。
作为测试，如果您使用这些特定值作为curve_fit 的初始参数估计值，它是否工作良好？该测试结果意味着您只需要代码来找到更好的初始参数估计值，然后就完成了。
不，它得出的拟合结果与对两个值都使用 0.1 的初始估计值基本相同。