python中的累积平均值答案

【问题标题】：Cumulative average in pythonpython中的累积平均值
【发布时间】：2022-01-04 00:34:52
【问题描述】：

我正在处理 csv 文件。

我想创建一个连续更新的序列平均值。例如;

我想输出列表中每个单独值的平均值

list; [a, b, c, d, e, f]
formula:

(a)/1= ?

(a+b)/2=?

(a+b+c)/3=?

(a+b+c+d)/4=?

(a+b+c+d+e)/5=?

(a+b+c+d+e+f)/6=?

演示：

如果我有一个清单； [1, 4, 7, 4, 19]

我的输出应该是； [1, 2.5, 4, 4, 7]

解释；

(1)/1=1

(1+4)/2=2.5

(1+4+7)/3=4

(1+4+7+4)/4=4

(1+4+7+4+19)/5=7

就我的python文件而言，它是一个简单的代码：

import matplotlib.pyplot as plt

import pandas as pd

df = pd.read_csv('somecsvfile.csv')

x = [] #has to be a list of 1 to however many rows are in the "numbers" column, will be a simple [1, 2, 3, 4, 5] etc...

#x will be used to divide the numbers selected in y to give us z

y = df[numbers]

z = #new dataframe derived from the continuous average of y

plt.plot(x, z)

plt.show()

如果需要 numpy 没问题。

【问题讨论】：

您的 CSV 文件是什么样的？
您要查找的术语是“累积平均值/平均值”。

标签： python pandas csv matplotlib

【解决方案1】：

pandas.DataFrame.expanding 是您所需要的。

使用它，您只需调用df.expanding().mean() 即可获得您想要的结果：

mean = df.expanding().mean()

print(mean)

Out[10]: 
0   1.0
1   2.5
2   4.0
3   4.0
4   7.0

如果您只想在一列中执行此操作，请使用pandas.Series.expanding。

只需使用列而不是df：

df['column_name'].expanding().mean()

【讨论】：

【解决方案2】：

要完整回答您的问题，请使用numpy 填写代码空白处并绘图：

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

#df = pd.read_csv('somecsvfile.csv')
#instead I just create a df with a column named 'numbers'
df = pd.DataFrame([1, 4, 7, 4, 19], columns = ['numbers',])

x = range(1, len(df)+1)  #x will be used to divide the numbers selected in y to give us z

y = df['numbers']
z = np.cumsum(y) / np.array(x)

plt.plot(x, z, 'o')
plt.xticks(x)
plt.xlabel('Entry')
plt.ylabel('Cumulative average')

但正如 Augusto 所指出的，您也可以将整个内容放入 DataFrame。为他的方法添加更多内容：

n = [1, 4, 7, 4, 19]
df = pd.DataFrame(n, columns = ['numbers',])
#augment the index so it starts at 1 like you want
df.index = np.arange(1, len(df)+1)

# create a new column for the cumulative average
df = df.assign(cum_avg = df['numbers'].expanding().mean())
#    numbers  cum_avg
# 1        1      1.0
# 2        4      2.5
# 3        7      4.0
# 4        4      4.0
# 5       19      7.0

# plot
df['cum_avg'].plot(linestyle = 'none',
                   marker = 'o',
                   xticks = df.index,
                   xlabel = 'Entry',
                   ylabel = 'Cumulative average')

【讨论】：

【解决方案3】：

您可以使用cumsum 得到累积和然后除以得到运行平均值。

x = np.array([1, 4, 7, 4, 19])
np.cumsum(x)/range(1,len(x)+1)
print (z)

输出：

[1.  2.5 4.  4.  7. ]

【讨论】：