如何从 30 个 csv 文件制作直方图以绘制直方图，然后使用高斯函数和标准差绘制直方图？答案

【问题标题】：How to make a histogram from 30 csv files to plot the historgram and then for it with gaussian function and the standard deviation?如何从 30 个 csv 文件制作直方图以绘制直方图，然后使用高斯函数和标准差绘制直方图？
【发布时间】：2021-12-09 02:59:51
【问题描述】：

我想从 30 个 csv 文件中制作一个直方图，然后拟合一个高斯函数来查看我的数据是否是最优的。之后，我需要找到这些峰的平均值和标准差。文件数据太大，我不知道我是否提取了单个列并将它们的值范围正确地组织到了 bin 的数量中。

我知道有点长，问题太多，请尽可能多地回答，非常感谢！

> this is the links of the data

到目前为止我已经完成了以下（实际上并不多，因为我是数据可视化的初学者。）首先，我导入包，savgol_filter 使 bin 透明，看起来更好。

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.signal import savgol_filter

然后我转换尺寸并设置限制。

def cm2inch(value):
    return value/2.54

width = 9
height = 6.75

sliceMin, sliceMax = 300, 1002

接下来我通过迭代 30 次加载所有数据 jupyter notebook，其中我设置了两个数组“时间”和“电压”来存储值。

times, voltages = [], []
for i in range(30):
    time, ch1 = np.loadtxt(f"{i+1}.txt", delimiter=',', skiprows=5,unpack=True)
    times.append(time)
    voltages.append(ch1)    
t = (np.array(times[0]) * 1e5)[sliceMin:sliceMax]
voltages = (np.array(voltages))[:, sliceMin:sliceMax]

1.我想我应该需要一个 hist 函数来绘制图表。虽然我有情节，但我不确定它是否是生成直方图的正确方法。

hist, bin_edges = np.histogram(voltages, bins=500, density=True)
hist = savgol_filter(hist, 51, 3)
bin_centres = (bin_edges[:-1] + bin_edges[1:])/2

到目前为止，我已经达到了。第三个峰值的幅度太低，这不是我所期望的。但是如果我的期望是错误的，请纠正我。

This is my histogram plot

I have updated my plot with the following code

labels = "hist"
if showGraph:
    plt.title("Datapoints Distribution over Voltage [mV]", )      
    plt.xlabel("Voltage [mV]")
    plt.ylabel("Data Points")
    plt.plot(hist, label=labels)
    plt.show()

2.（已编辑）我不确定为什么我的标签无法显示，请您纠正我吗？

3.（已编辑）此外，我想通过对直方图使用高斯函数来制作拟合曲线。但是有三个峰值，那么我应该如何将函数拟合到它们呢？

def gauss(x, *p):
A, mu, sigma = p
        return A*np.exp(-(x-mu)**2/(2.*sigma**2))

4. （已编辑）我意识到我还没有提到平均值。我想如果我可以找到峰值的最大值，那么我可以找到特定峰值的平均值。我需要先拟合高斯才能找到峰值，还是可以找到直线？是否要找到局部最大值以便我找到它？如果是，我该如何处理？

5. （已编辑）我知道如何从单个列表中找到标准差，如果我想做类似的逻辑，代码如何实现？

sample = [1,2,3,4,5,5,5,5,10]
standard_deviation = np.std(sample, ddof=1)
print(standard_deviation)

反馈建议：我尝试实现高斯拟合，下面是我导入的包。

from sklearn.mixture import GaussianMixture
import numpy as np
import matplotlib.pyplot as plt

这里是高斯函数，我将我的 30 个数据集电压作为高斯混合拟合的参数，打印出我们关于 mu 和 variance 的大量值。

gmm = GaussianMixture(n_components=1)
gmm.fit(voltages)
print(gmm.means_, gmm.covariances_)
mu = gmm.means_[0][0]
variance = gmm.covariances_[0][0][0]
print(mu, variance)

我一一处理代码。第二行有错误：

fig, ax = plt.subplots(figsize=(6,6))
Xs = np.arange(min(voltages), max(voltages), 0.05)

具有多个元素的数组的真值是不明确的。使用 a.any() 或 a.all()

我从网上查到，用this表示只有一个值，比如如果有[T,T,F,F,T]，你可以有4种可能。

我将代码编辑为：

Xs = np.arange(min(np.all(voltages)), max(np.all(voltages)), 0.05)

这给了我这个：

'numpy.bool_' 对象不可迭代

我知道它不是一个布尔对象。在这个阶段，我不知道如何进行高斯曲线拟合。谁能给我另一种方法吗？

【问题讨论】：

标签： python matplotlib visualization

【解决方案1】：

要绘制直方图，最普通的matplotlib 函数hist 是我的首选。基本上，如果我有一个samples 的列表，那么我可以通过以下方式用100 bins 绘制它们的直方图：

import matplotlib.pyplot as plt
plt.hist(samples, bins=100)
plt.show()

如果您想将正态分布拟合到您的数据中，最好的模型是 高斯混合模型，您可以通过 scikit-learn's GMM page 找到有关该模型的更多信息。也就是说，这是我用来将奇异高斯分布拟合到数据集的代码。如果我想适应k 正态分布，我需要使用n_components=k。我还包括了结果图：

from sklearn.mixture import GaussianMixture
import numpy as np
import matplotlib.pyplot as plt

data = np.random.uniform(-1,1, size=(800,1))
data += np.random.uniform(-1,1, size=(800,1))
gmm = GaussianMixture(n_components=1)
gmm.fit(data)
print(gmm.means_, gmm.covariances_)
mu = gmm.means_[0][0]
variance = gmm.covariances_[0][0][0]
print(mu, variance)
fig, ax = plt.subplots(figsize=(6,6))
Xs = np.arange(min(data), max(data), 0.05)
ys = 1.0/np.sqrt(2*np.pi*variance) * np.exp(-0.5/variance * (Xs + mu)**2)
ax.hist(data, bins=100, label='data')
px = ax.twinx()
px.plot(Xs, ys, c='r', linestyle='dotted', label='fit')
ax.legend()
px.legend(loc='upper left')
plt.show()

至于问题 3，我不确定您要捕获哪个轴的标准差。如果要获取列的标准差，可以使用np.std(data, axis=1)，逐行标准差使用axis=0。

【讨论】：

非常感谢您的帮助，我已经编辑了我的帖子，希望您能回答我的最后一部分吗？我正在处理您的建议，希望尽快回复您。
非常感谢上述高斯拟合。上面的“数据”是针对一个数据集的，我把我的 30 个数据集“电压”。错误返回：“具有多个元素的数组的真值不明确。使用 a.any() 或 a.all()”。我最好提出我的问题，以便您查看。
我尝试用另一个函数来绘制它。我可以成功绘制曲线，但不适合我的情节。 stackoverflow.com/questions/69687843/…