【发布时间】:2017-08-31 18:11:21
【问题描述】:
我有一个 x 的假设 y 函数,并试图找到/拟合一条对数正态分布曲线,该曲线可以最好地塑造数据。我正在使用 curve_fit 函数并且能够拟合正态分布,但曲线看起来并不优化。
下面是给出 y = f(x) 的 y 和 x 数据点。
y_axis = [0.00032425299473065838, 0.00063714106162861229, 0.00027009331177605913, 0.00096672396877715144, 0.002388766809835889, 0.0042233337680543182, 0.0053072824980722137, 0.0061291327849408699, 0.0064555344006149871, 0.0065601228278316746, 0.0052574034010282218, 0.0057924488798939255, 0.0048154093097913355, 0.0048619350036057446, 0.0048154093097913355, 0.0045114840997070331, 0.0034906838696562147, 0.0040069911024866456, 0.0027766995669134334, 0.0016595801819374015, 0.0012182145074882836, 0.00098231827111984341, 0.00098231827111984363, 0.0012863691645616997, 0.0012395921040321833, 0.00093554121059032721, 0.0012629806342969417, 0.0010057068013846018, 0.0006081017868837127, 0.00032743942370661445, 4.6777060529516312e-05, 7.0165590794274467e-05, 7.0165590794274467e-05, 4.6777060529516745e-05]
y 轴是事件在 x 轴时间段中发生的概率:
x_axis = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0, 33.0, 34.0]
我能够使用 excel 和对数正态方法更好地拟合我的数据。当我尝试在 python 中使用对数正态时,拟合不起作用,我做错了什么。
以下是我用于拟合正态分布的代码,这似乎是我可以在 python 中拟合的唯一代码(难以置信):
#fitting distributino on top of savitzky-golay
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import scipy
import scipy.stats
import numpy as np
from scipy.stats import gamma, lognorm, halflogistic, foldcauchy
from scipy.optimize import curve_fit
matplotlib.rcParams['figure.figsize'] = (16.0, 12.0)
matplotlib.style.use('ggplot')
# results from savgol
x_axis = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0, 33.0, 34.0]
y_axis = [0.00032425299473065838, 0.00063714106162861229, 0.00027009331177605913, 0.00096672396877715144, 0.002388766809835889, 0.0042233337680543182, 0.0053072824980722137, 0.0061291327849408699, 0.0064555344006149871, 0.0065601228278316746, 0.0052574034010282218, 0.0057924488798939255, 0.0048154093097913355, 0.0048619350036057446, 0.0048154093097913355, 0.0045114840997070331, 0.0034906838696562147, 0.0040069911024866456, 0.0027766995669134334, 0.0016595801819374015, 0.0012182145074882836, 0.00098231827111984341, 0.00098231827111984363, 0.0012863691645616997, 0.0012395921040321833, 0.00093554121059032721, 0.0012629806342969417, 0.0010057068013846018, 0.0006081017868837127, 0.00032743942370661445, 4.6777060529516312e-05, 7.0165590794274467e-05, 7.0165590794274467e-05, 4.6777060529516745e-05]
## y_axis values must be normalised
sum_ys = sum(y_axis)
# normalize to 1
y_axis = [_/sum_ys for _ in y_axis]
# def gamma_f(x, a, loc, scale):
# return gamma.pdf(x, a, loc, scale)
def norm_f(x, loc, scale):
# print 'loc: ', loc, 'scale: ', scale, "\n"
return norm.pdf(x, loc, scale)
fitting = norm_f
# param_bounds = ([-np.inf,0,-np.inf],[np.inf,2,np.inf])
result = curve_fit(fitting, x_axis, y_axis)
result_mod = result
# mod scale
# results_adj = [result_mod[0][0]*.75, result_mod[0][1]*.85]
plt.plot(x_axis, y_axis, 'ro')
plt.bar(x_axis, y_axis, 1, alpha=0.75)
plt.plot(x_axis, [fitting(_, *result[0]) for _ in x_axis], 'b-')
plt.axis([0,35,0,.1])
# convert back into probability
y_norm_fit = [fitting(_, *result[0]) for _ in x_axis]
y_fit = [_*sum_ys for _ in y_norm_fit]
print list(y_fit)
plt.show()
我试图回答两个问题:
- 这是我从正态分布曲线中得到的最佳拟合吗?我怎样才能提高我的合身度?
- 如何为这些数据拟合对数正态分布,或者是否有更好的分布可以使用?
我在玩对数正态分布曲线调整 mu 和 sigma,看起来可能有更好的拟合。我不明白我做错了什么才能在 python 中得到类似的结果。
【问题讨论】:
-
你能展示一下你的合身度吗?
-
Warren:我纠正了负面因素,希望对您有所帮助。 Mikey:我很快就能上传我的合身照片。
-
您的 y 值是否按比例计算?
标签: python numpy scipy statistics distribution