【问题标题】:Python - Storing the average of from files in loop; and then finding global average outside of loop?Python - 在循环中存储来自文件的平均值;然后在循环之外找到全局平均值?
【发布时间】:2020-12-05 18:05:55
【问题描述】:

我有一个函数,它当前循环遍历以“K”和“Z”开头的文件,并绘制“Temp”数据;蓝色表示“K”数据,红色表示“Z”数据。这非常适合我的目标。

我被困在哪里:

  1. 我现在想为循环中的每个文件取样本 100 和样本 350 之间“温度”的平均值。
  2. 然后,我想将每个文件的平均值存储在一个新的数据帧中,其中有一列用于“K”平均值,一列用于“Z”平均值。
  3. 最后,在循环之外,我想取“K”列的平均值和“Z”列的平均值;并将其绘制在图表上。

在下面的代码中,我已将 cmets 放置在我卡住的区域。

作为一个附带问题,如果有人知道一种自动检测每个数据集的“平坦”区域(斜率〜= 0)然后自动选择平均间隔的好方法;那将是一件很酷的事情!因为现在,我肯定会通过设置固定间隔丢失一些数据点。

filenamesK = glob("C:/Users/K*.csv")
filenamesZ = glob("C:/Users/Z*.csv")

def plot_data(filename, fig_ax, color):
    df = pd.read_csv(f, sep=',',skiprows=24)
    df.columns=['sample','Temp']
    df=df.astype(str)

    df["Temp"] = df["Temp"].str.replace('\+ ', '').str.replace(' ', '').astype(float)
    
    # Now take the average of df["Temp"] from sample 100 until sample 350.
    
    # Append this average to a K_Z_Averages, containing a column for average 
    # from each K file and the average from each Z file.
    
    fig_ax.plot(df[["Temp"]], color=color)

fig, ax = plt.subplots()

for f in filenamesK:
    plot_data(f, ax, 'blue')

for f in filenamesZ:
    plot_data(f, ax, 'red')

# After the loop is finished, take the average of each column in K_Z_averages 
# with each average from the K files and from the Z files.    
    
plt.show()

第 2 部分: 如果我的 .csv 文件有第二个 Temp,“Temp2”,我想提取它,你能支持将它添加到 dict 吗?例如,在dict 中有一个列用于K_Temp、K_Temp2、Z_Temp、Z_Temp2

我用我认为可行的方式修改了我的代码,但我想有一种更有效的方法来做到这一点:

filenamesK = glob("C:/Users/K*.csv")
filenamesZ = glob("C:/Users/Z*.csv")

# Create dict of lists for storing the averages
K_Z_Averages = {'K':[], 'Z':[]}

def plot_data(filename, fig_ax, color):
    df = pd.read_csv(f, sep=',',skiprows=24)
    df.columns=['sample','Temp','Temp2']
    df=df.astype(str)

    df["Temp"] = df["Temp"].str.replace('\+ ', '').str.replace(' ', '').astype(float)
    df["Temp2"] = df["Temp2"].str.replace('\+ ', '').str.replace(' ', '').astype(float)
    
    # Now take the average of df["Temp"] from sample 100 until sample 350.
    avg_Temp1 = df.iloc[100-1:350+1]['Temp'].mean()
    avg_Temp2 = df.iloc[100-1:350+1]['Temp2'].mean()
    
    # Append this average to a K_Z_Averages, containing a column for average 
    # from each K file and the average from each Z file.
    K_Z_Averages[filename.split('/')[-1][0]].append(avg_Temp1)
    K_Z_Averages[filename.split('/')[-1][0]].append(avg_Temp2)
    
    fig_ax.plot(df[["Temp"]], color=color)

fig, ax = plt.subplots()

for f in filenamesK:
    plot_data(f, ax, 'blue')

for f in filenamesZ:
    plot_data(f, ax, 'red')

# Take the overall average 
df_avg = pd.DataFrame(K_Z_Averages).mean() 

# Add vertical lines for each mean
ax.vlines(df_avg, *ax.get_ylim(), linestyles='--', colors=['blue','red'], alpha=.5)

plt.show()

【问题讨论】:

    标签: python pandas average glob


    【解决方案1】:

    您可以创建一个字典来存储每个文件的平均值,然后使用它来附加平均值:

    # Before the the `plot_data` definition
    K_Z_Averages = {'K':[], 'Z':[]}
    
    # Inside the function
    avg = df.iloc[100-1:350+1]['Temp'].mean()
    K_Z_Averages[filename.split('/')[-1][0]].append(avg)
    

    其中filename.split('/')[-1][0] 删除路径扩展名并获取文件名的第一个字母(类似于使用os.path.basename(filename)[0])。

    然后,取平均值的整体平均值:

    pd.DataFrame(K_Z_Averages).mean()
    

    完整的代码应该是这样的:

    filenamesK = glob("C:/Users/K*.csv")
    filenamesZ = glob("C:/Users/Z*.csv")
    
    # Create dict of lists for storing the averages
    K_Z_Averages = {'K':[], 'Z':[]}
    
    def plot_data(filename, fig_ax, color):
        df = pd.read_csv(f, sep=',',skiprows=24)
        df.columns=['sample','Temp']
        df=df.astype(str)
    
        df["Temp"] = df["Temp"].str.replace('\+ ', '').str.replace(' ', '').astype(float)
        
        # Now take the average of df["Temp"] from sample 100 until sample 350.
        avg = df.iloc[100-1:350+1]['Temp'].mean()
        
        # Append this average to a K_Z_Averages, containing a column for average 
        # from each K file and the average from each Z file.
        K_Z_Averages[filename.split('/')[-1][0]].append(avg)
        
        fig_ax.plot(df[["Temp"]], color=color)
    
    fig, ax = plt.subplots()
    
    for f in filenamesK:
        plot_data(f, ax, 'blue')
    
    for f in filenamesZ:
        plot_data(f, ax, 'red')
    
    # Take the overall average 
    df_avg = pd.DataFrame(K_Z_Averages).mean() 
    
    # Add vertical lines for each mean
    ax.vlines(df_avg, *ax.get_ylim(), linestyles='--', colors=['blue','red'], alpha=.5)
    
    plt.show()
    

    问题编辑(第 2 部分)后,代码应如下所示:

    import pandas as pd
    from glob import glob
    from os.path import basename
    import matplotlib.pyplot as plt
    
    filenamesK = glob("C:/Users/K*.csv")
    filenamesZ = glob("C:/Users/Z*.csv")
    
    # Create dict of lists for storing the averages
    K_Z_Averages = {'K_Temp':[], 'K_Temp2': [], 'Z_Temp':[], 'Z_Temp2': []}
    
    def plot_data(filename, fig_ax, color):
        df = pd.read_csv(f, sep=',',skiprows=24)
        df.columns=['sample','Temp','Temp2']
        df=df.astype(str)
    
        df["Temp"] = df["Temp"].str.replace('\+ ', '').str.replace(' ', '').astype(float)
        df["Temp2"] = df["Temp2"].str.replace('\+ ', '').str.replace(' ', '').astype(float)
        
        # Now take the average of df["Temp"] from sample 100 until sample 350.
        avg_Temp1 = df.iloc[100-1:350+1]['Temp'].mean()
        avg_Temp2 = df.iloc[100-1:350+1]['Temp2'].mean()
        
        # Append this average to a K_Z_Averages, containing a column for average 
        # from each K file and the average from each Z file.
        K_Z_Averages[basename(filename)[0] + "_Temp"].append(avg_Temp1)
        K_Z_Averages[basename(filename)[0] + "_Temp2"].append(avg_Temp2)
        
        fig_ax.plot(df[["Temp"]], color=color)
        fig_ax.plot(df[["Temp2"]], color=color)
    
    fig, ax = plt.subplots()
    
    for f in filenamesK:
        plot_data(f, ax, 'blue')
        plot_data(f, ax, 'darkblue')
    
    for f in filenamesZ:
        plot_data(f, ax, 'red')
        plot_data(f, ax, 'darkred')
    
    # Take the overall average 
    df_avg = pd.DataFrame(K_Z_Averages).mean() 
    
    # Add vertical lines for each mean
    ax.vlines(df_avg, *ax.get_ylim(), linestyles='--', colors=['blue','darkblue','red','darkred'], alpha=.5)
    
    plt.show()
    

    【讨论】:

    • @Gary 这是真的!我忘记了 glob 返回整个路径。您可以将0 替换为10(请参阅更新),或者使用filename.split('/')[-1][0],甚至os.path.basename(filename)[0]
    • 哇,效果很好。干杯芽!谢谢你的支持!我用os.path.basename(filename)[0]
    • 嗨@Gary,我已经添加了第2部分的代码,希望它有效!注意dict 创建的行,append。另外,请注意我在图中添加了“Temp2”行/平均值,但由于我没有用于测试的数据,我只能猜测它有效。最好的!
    • 另外,由于您已经在字典中使用了四个键,因此值得看看 collections.defaultdict 使用列表作为默认项。
    • 效果很好!再次感谢,非常感谢!
    【解决方案2】:

    我不确定我是否理解“K_Z_Average”问题的第二部分。但这里是:

        # Now take the average of df["Temp"] from sample 100 until sample 350.
        average_temperature=df.iloc[100:350]['Temp'].mean()
       
        # Append this average to a K_Z_Averages, containing a column for average 
        # from each K file and the average from each Z file.
        df['K_Z_Average']=average_temparature
    

    【讨论】:

    • 请注意,这里您将样本的平均值(i.e. 行)从 101(因为它以 0 开头)到 349(因为它不包含在末尾) .此外,我们需要将这些值存储在某个地方以便稍后检索它们(这个df 只是临时的)。
    • 知道了 - 你的答案似乎很完整。
    猜你喜欢
    • 1970-01-01
    • 2017-03-22
    • 2016-03-31
    • 1970-01-01
    • 2021-02-12
    • 1970-01-01
    • 2016-12-22
    • 1970-01-01
    • 2021-06-29
    相关资源
    最近更新 更多