在日期时间索引上减去具有不同粒度的数据框列答案

【问题标题】：substracting dataframe columns with different granularity on datetime index在日期时间索引上减去具有不同粒度的数据框列
【发布时间】：2020-09-16 18:33:00
【问题描述】：

我有一些价格数据（例如来自 yahoo Finance），以每日粒度日期时间变量作为索引，让我们称其为 df，并将其作为价值变量的微软收盘价。要获得月平均收盘价，我显然可以做以下两个之一：

import yfinance as yf
import pandas as pd
df = yf.download("CMG", start="2012-01-01",end="2020-01-01" )
dfm = df.resample("M").mean()
dfm2 = df.groupby(df.index.to_period("M")).mean()

在我看来，它们看起来非常相似，唯一的区别是，在重新采样时，新的日期时间索引是月底，而 groupby to_period 的索引为 yyyy-mm。

我现在想在 df 中添加一个具有每日粒度的列，该列具有每个每日收盘价与其每月平均值的偏差。所以 2020 年 1 月 1 日 - 平均值（1 月 20 日），2020 年 1 月 2 日 - 平均值（1 月 20 日），2020 年 2 月 1 日 - 平均值（2 月 20 日）。

因为 dfm 和 df 有不同的索引，我不能只做 df-dfm。

我唯一能想到的是循环遍历 df，在 dfm 上放置一个计数器并在循环中放置一个 if 语句，这是一种非常 c 类型的方法，但不是很 Python。我猜它看起来像这样（但这不起作用）：

counter = 0
df["dailyminusmonthly"] =""

for i in df:
if i.index <= dfm[counter].index:
    i.dailyminusmonthly = i.close - dfm.close[counter]
else:
    counter = counter +1
    i.dailyminusmonthly = i.close - dfm.close[counter]

【问题讨论】：

你能创建minimal, complete, and verifiable example吗？

标签： python pandas

【解决方案1】：

您可以将DatetimeIndex转换为月度周期，因此您可以减去dfm2，然后才需要转换为numpy数组，因为减去Series的索引与原始df不同，以防止完整NaN s 栏：

df['dailyminusmonthly1']= (df['Close'].to_period('M')
                                    .sub(df.groupby(df.index.to_period("M"))['Close'].mean())
                                    .to_numpy())

另一个更简单的解决方案是使用 Resampler.transform 与原始索引相同：

df['dailyminusmonthly2']= df['Close'].sub(df.resample("M")['Close'].transform('mean'))

print (df)
                  Open        High         Low       Close   Adj Close  \
Date                                                                     
2012-01-03  343.700012  350.489990  340.000000  341.269989  341.269989   
2012-01-04  346.000000  349.980011  345.010010  348.750000  348.750000   
2012-01-05  346.880005  351.980011  342.570007  350.480011  350.480011   
2012-01-06  348.880005  352.630005  347.350006  348.950012  348.950012   
2012-01-09  349.000000  349.489990  336.290009  339.739990  339.739990   
               ...         ...         ...         ...         ...   
2019-12-24  827.099976  829.409973  823.159973  828.890015  828.890015   
2019-12-26  829.409973  839.280029  828.239990  838.599976  838.599976   
2019-12-27  839.969971  840.000000  835.000000  836.789978  836.789978   
2019-12-30  838.169983  838.750000  829.010010  836.070007  836.070007   
2019-12-31  837.239990  842.270020  833.359985  837.109985  837.109985   

            Volume  dailyminusmonthly1  dailyminusmonthly2  
Date                                                        
2012-01-03  728100          -13.559013          -13.559013  
2012-01-04  743100           -6.079002           -6.079002  
2012-01-05  672300           -4.348991           -4.348991  
2012-01-06  370700           -5.878990           -5.878990  
2012-01-09  748600          -15.089012          -15.089012  
           ...                 ...                 ...  
2019-12-24   91900            3.640494            3.640494  
2019-12-26  255400           13.350455           13.350455  
2019-12-27  201900           11.540458           11.540458  
2019-12-30  211400           10.820487           10.820487  
2019-12-31  282200           11.860465           11.860465  

[2012 rows x 8 columns]

【讨论】：

谢谢你，抱歉这个凌乱的问题，我是 python 新手
@nik - 我认为问题很好，只是错过了 mcve，不客气 ;) 编码快乐 ;)