【问题标题】:Python Curve_Fit Exponential / Power / Log Curve - Improve ResultsPython Curve_Fit 指数/幂/对数曲线 - 改善结果
【发布时间】:2020-04-04 13:36:21
【问题描述】:

我正在尝试拟合渐近接近零(但从未达到零)的数据。

我认为最好的曲线是逆逻辑函数,但欢迎提出建议。关键是预期的衰减“S曲线”形状。

这是我目前拥有的代码,以及下面的绘图图像,非常丑陋。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

# DATA

x = pd.Series([1,1,264,882,913,1095,1156,1217,1234,1261,1278,1460,1490,1490,1521,1578,1612,1612,1668,1702,1704,1735,1793,2024,2039,2313,2313,2558,2558,2617,2617,2708,2739,2770,2770,2831,2861,2892,2892,2892,2892,2892,2923,2923,2951,2951,2982,2982,3012,3012,3012,3012,3012,3012,3012,3073,3073,3073,3104,3104,3104,3104,3135,3135,3135,3135,3165,3165,3165,3165,3165,3196,3196,3196,3226,3226,3257,3316,3347,3347,3347,3347,3377,3377,3438,3469,3469]).values
y = pd.Series([1000,600,558.659217877095,400,300,100,7.75,6,8.54,6.66666666666667,7.14,1.1001100110011,1.12,0.89,1,2,0.666666666666667,0.77,1.12612612612613,0.7,0.664010624169987,0.65,0.51,0.445037828215398,0.27,0.1,0.26,0.1,0.1,0.13,0.16,0.1,0.13,0.1,0.12,0.1,0.13,0.14,0.14,0.17,0.11,0.15,0.09,0.1,0.26,0.16,0.09,0.09,0.05,0.09,0.09,0.1,0.1,0.11,0.11,0.09,0.09,0.11,0.08,0.09,0.09,0.1,0.06,0.07,0.07,0.09,0.05,0.05,0.06,0.07,0.08,0.08,0.07,0.1,0.08,0.08,0.05,0.06,0.04,0.04,0.05,0.05,0.04,0.06,0.05,0.05,0.06]).values

# Inverse Logistic Function 
# https://en.wikipedia.org/wiki/Logistic_function
def func(x, L ,x0, k, b):
    y = 1/(L / (1 + np.exp(-k*(x-x0)))+b)
    return y

# FIT DATA

p0 = [max(y), np.median(x),1,min(y)] # this is an mandatory initial guess
popt, pcov = curve_fit(func, x, y,p0, method='dogbox',maxfev=10000)

# PERFORMANCE

modelPredictions = func(x, *popt)
absError = modelPredictions - y
SE = np.square(absError) # squared errors
MSE = np.mean(SE) # mean squared errors
RMSE = np.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (np.var(absError) / np.var(y))

print('Parameters:', popt)
print('RMSE:', RMSE)
print('R-squared:', Rsquared)

#PLOT

plt.figure()
plt.plot(x, y, 'ko', label="Original Noised Data")
plt.plot(x, func(x, *popt), 'r-', label="Fitted Curve")
plt.legend()
plt.yscale('log')
#plt.xscale('log')
plt.show()

这是运行此代码时的结果......以及我想要实现的目标!

我怎样才能更好地优化curve_fit,而不是代码生成的红色线,我得到更接近蓝色绘制线的东西?

谢谢!!

【问题讨论】:

  • 这看起来像是您想要对其运行多项式回归(在本例中为三次),而不是逻辑回归。
  • @Mike'Pomax'Kamermans 感谢您的提示!有什么建议可以让我获得示例或示例代码吗?谢谢!

标签: python optimization scipy logistic-regression curve-fitting


【解决方案1】:

我看到您的绘图使用了对数缩放,并且我发现几个不同的 sigmoidal 方程可以很好地拟合 Y 数据的自然对数。这是一个图形化 Python fitter,使用 Y 数据的自然对数和四参数 Logistic 方程:

import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import warnings


xData = numpy.array([1,1,264,882,913,1095,1156,1217,1234,1261,1278,1460,1490,1490,1521,1578,1612,1612,1668,1702,1704,1735,1793,2024,2039,2313,2313,2558,2558,2617,2617,2708,2739,2770,2770,2831,2861,2892,2892,2892,2892,2892,2923,2923,2951,2951,2982,2982,3012,3012,3012,3012,3012,3012,3012,3073,3073,3073,3104,3104,3104,3104,3135,3135,3135,3135,3165,3165,3165,3165,3165,3196,3196,3196,3226,3226,3257,3316,3347,3347,3347,3347,3377,3377,3438,3469,3469], dtype=float)
yData = numpy.array([1000,600,558.659217877095,400,300,100,7.75,6,8.54,6.66666666666667,7.14,1.1001100110011,1.12,0.89,1,2,0.666666666666667,0.77,1.12612612612613,0.7,0.664010624169987,0.65,0.51,0.445037828215398,0.27,0.1,0.26,0.1,0.1,0.13,0.16,0.1,0.13,0.1,0.12,0.1,0.13,0.14,0.14,0.17,0.11,0.15,0.09,0.1,0.26,0.16,0.09,0.09,0.05,0.09,0.09,0.1,0.1,0.11,0.11,0.09,0.09,0.11,0.08,0.09,0.09,0.1,0.06,0.07,0.07,0.09,0.05,0.05,0.06,0.07,0.08,0.08,0.07,0.1,0.08,0.08,0.05,0.06,0.04,0.04,0.05,0.05,0.04,0.06,0.05,0.05,0.06], dtype=float)

# fit the natural lpg of the data
yData = numpy.log(yData)

warnings.filterwarnings("ignore") # do not print "invalid value" warnings during fit
def func(x, a, b, c, d): # Four-Parameter Logistic from zunzun.com
    return d + (a - d) / (1.0 + numpy.power(x / c, b))


# these are the same as the scipy defaults
initialParameters = numpy.array([1.0, 1.0, 1.0, 1.0])

# curve fit the test data
fittedParameters, pcov = curve_fit(func, xData, yData, initialParameters)

modelPredictions = func(xData, *fittedParameters) 

print('Parameters:', fittedParameters)

print()


##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
    axes = f.add_subplot(111)

    # first the raw data as a scatter plot
    axes.plot(xData, yData,  'D')

    # create data for the fitted equation plot
    xModel = numpy.linspace(min(xData), max(xData))
    yModel = func(xModel, *fittedParameters)

    # now the model as a line plot
    axes.plot(xModel, yModel)

    axes.set_xlabel('X Data') # X axis data label
    axes.set_ylabel('Natural Log of Y Data') # Y axis data label

    plt.show()
    plt.close('all') # clean up after using pyplot

graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)

【讨论】:

  • 这真是太好了,谢谢!不过,从 2000 年开始的拟合可能会更好。我认为在最新数据 (3500->) 中它正朝着错误的方向发展,这是一个问题。
  • 我认为曲线的末端需要更陡峭,而不是太平,这可能吗?捕捉 3500 区域的指数下降很重要。 IE。接近但从未达到零
【解决方案2】:

在您的情况下,我会将双曲正切1拟合到您数据的以 10 为底的对数。

让我们使用

log10 (y)=y₀ - a em> tanh (λ(x-x₀)) em>

作为你的功能

您的 x 大约在 0 到 3500 之间,您的 log10(y) 从 3 到 -1,前提是 tanh(2) = -tanh(2 ) ≈ 1 我们有

            y₀+a = 3, y0-a= -1 ⇒ y₀ = 1, a = 2;

            λ = (2-(-2)) / (3500-0); x₀ = (3500-0)/2。

(这个粗略的估计对于证明curve_fit的初始猜测是必要的,否则程序会丢失)。

省略了我最终得到的样板

X = np.linspace(0, 3500, 701)
plt.scatter(x, np.log10(y), label='data')
plt.plot(X, 1-2*np.tanh(4/3500*(X-1750)), label='hand fit')
(y0, a, l, x0), *_ = curve_fit(
    lambda x, y0, a, l,x 0: y0 - a*np.tanh(l*(x-x0)),
    x, np.log10(y),
    p0=[1, 2, 4/3500, 3500/2])
plt.plot(X, y0-a*np.tanh(l*(X-x0)), label='curve_fit fit')
plt.legend()


注1:the logistic function is the hyperbolic tangent in disguise

【讨论】:

    【解决方案3】:

    根据您的数据图和预期拟合,我猜您并不想将您的数据 y 建模为类似逻辑的阶跃函数,而是将 log(y) 建模为逻辑-像阶跃函数。

    所以,我认为您可能想要使用逻辑阶跃函数,也许添加一个线性组件来对该数据的日志建模。我会用lmfit 来做这件事,因为它带有内置模型,可以提供更好的结果报告,并允许您大大简化拟合代码(免责声明:我是主要作者):

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    from scipy.optimize import curve_fit
    
    from lmfit.models import StepModel, LinearModel
    
    # DATA
    x = pd.Series([1, 1, 264, 882, 913, 1095, 1156, 1217, 1234, 1261, 1278,
                  1460, 1490, 1490, 1521, 1578, 1612, 1612, 1668, 1702, 1704,
                  1735, 1793, 2024, 2039, 2313, 2313, 2558, 2558, 2617, 2617,
                  2708, 2739, 2770, 2770, 2831, 2861, 2892, 2892, 2892, 2892,
                  2892, 2923, 2923, 2951, 2951, 2982, 2982, 3012, 3012, 3012,
                  3012, 3012, 3012, 3012, 3073, 3073, 3073, 3104, 3104, 3104,
                  3104, 3135, 3135, 3135, 3135, 3165, 3165, 3165, 3165, 3165,
                  3196, 3196, 3196, 3226, 3226, 3257, 3316, 3347, 3347, 3347,
                  3347, 3377, 3377, 3438, 3469, 3469]).values
    
    y = pd.Series([1000, 600, 558.659217877095, 400, 300, 100, 7.75, 6, 8.54,
                  6.66666666666667, 7.14, 1.1001100110011, 1.12, 0.89, 1, 2,
                  0.666666666666667, 0.77, 1.12612612612613, 0.7,
                  0.664010624169987, 0.65, 0.51, 0.445037828215398, 0.27, 0.1,
                  0.26, 0.1, 0.1, 0.13, 0.16, 0.1, 0.13, 0.1, 0.12, 0.1, 0.13,
                  0.14, 0.14, 0.17, 0.11, 0.15, 0.09, 0.1, 0.26, 0.16, 0.09,
                  0.09, 0.05, 0.09, 0.09, 0.1, 0.1, 0.11, 0.11, 0.09, 0.09,
                  0.11, 0.08, 0.09, 0.09, 0.1, 0.06, 0.07, 0.07, 0.09, 0.05,
                  0.05, 0.06, 0.07, 0.08, 0.08, 0.07, 0.1, 0.08, 0.08, 0.05,
                  0.06, 0.04, 0.04, 0.05, 0.05, 0.04, 0.06, 0.05, 0.05, 0.06]).values
    
    model = StepModel(form='logistic') + LinearModel()
    params = model.make_params(amplitude=-5, center=1000, sigma=100, intercept=0, slope=0)
    
    result = model.fit(np.log(y), params, x=x)
    
    print(result.fit_report())
    
    plt.plot(x, y, 'ko', label="Original Noised Data")
    plt.plot(x, np.exp(result.best_fit), 'r-', label="Fitted Curve")
    plt.legend()
    plt.yscale('log')
    plt.show()
    

    这将打印出包含拟合统计数据和最佳拟合值的报告:

    [[Model]]
        (Model(step, form='logistic') + Model(linear))
    [[Fit Statistics]]
        # fitting method   = leastsq
        # function evals   = 73
        # data points      = 87
        # variables        = 5
        chi-square         = 9.38961801
        reduced chi-square = 0.11450754
        Akaike info crit   = -183.688405
        Bayesian info crit = -171.358865
    [[Variables]]
        amplitude: -4.89008796 +/- 0.29600969 (6.05%) (init = -5)
        center:     1180.65823 +/- 15.2836422 (1.29%) (init = 1000)
        sigma:      94.0317580 +/- 18.5328976 (19.71%) (init = 100)
        slope:     -0.00147861 +/- 8.1151e-05 (5.49%) (init = 0)
        intercept:  6.95177838 +/- 0.17170849 (2.47%) (init = 0)
    [[Correlations]] (unreported correlations are < 0.100)
        C(amplitude, slope)     = -0.798
        C(amplitude, sigma)     = -0.649
        C(amplitude, intercept) = -0.605
        C(center, intercept)    = -0.574
        C(sigma, slope)         =  0.542
        C(sigma, intercept)     =  0.348
        C(center, sigma)        = -0.335
        C(amplitude, center)    =  0.282
    

    并产生这样的情节

    如果您愿意,您当然可以使用 scipy.optimize.curve_fit 重现所有这些,但我会将其留作练习。

    【讨论】:

    • 谢谢!不确定这些公式的方程式是什么?我有兴趣预测超过 3500,所以有一个公式很重要。干杯!
    • 使用的公式是标准的逻辑函数(见lmfit.github.io/lmfit-py/builtin_models.html#stepmodel)+一行。 Lmfit 内置了这些带有命名参数的模型,让您的生活更轻松。该模型应该扩展超过 x=3600 没有任何问题。但是您当然可以尝试其他模型,看看哪个模型最适合您的数据。要了解的最重要的一点是,您希望将 Logistic+Line 模型拟合到 log(y), 而不是 y
    猜你喜欢
    • 2020-05-05
    • 1970-01-01
    • 1970-01-01
    • 2019-10-19
    • 2019-01-28
    • 1970-01-01
    • 2021-10-07
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多