【问题标题】:Calculate Time DIfference Pandas by columns按列计算时间差 Pandas
【发布时间】:2019-03-05 11:28:06
【问题描述】:

我有列df['Status'],其中有一些对象:

In: df.Status.unique() Out: array([nan, 'Open', 'Plmt', 'SHRT', 'Check'], dtype=object)

列:

In: df['Status'] Out: time Status 2016-01-15 08:55:00 Open 2016-01-15 09:00:00 Plmt 2016-01-15 09:05:00 Plmt 2016-01-15 09:10:00 Plmt 2016-01-15 09:15:00 Plmt 2016-01-15 09:20:00 Plmt 2016-01-15 09:25:00 Plmt 2016-01-15 09:30:00 Plmt 2016-01-15 09:35:00 Plmt 2016-01-15 09:40:00 SHRT

time 在哪里:

df.index = df['time'] df.index = pd.to_datetime(df.index)

我想跳过不需要的值('Plmt', 'Check', 'nan'),创建新列 df['Diff'],'Open' 'SHRT' 之间的分钟差在哪里。

我正在尝试这样:

df['Status'][df['Status'] == 'SHRT'] - df['Status'][df['Status'] == 'Open']

但在输出中接收 NaN 值:

time 2016-01-15 08:55:00 NaN 2016-01-15 09:40:00 NaN 2016-01-18 08:30:00 NaN 2016-01-19 14:30:00 NaN 2016-01-19 14:35:00 NaN 2016-01-20 11:10:00 NaN 2016-01-20 11:45:00 NaN

预期的输出必须如下所示: time Status Diff 2016-01-15 08:55:00 Open NaN 2016-01-15 09:40:00 SHRT 00:45:00 2016-02-15 10:00:00 Open NaN 2016-02-15 14:15:00 SHRT 02:15:00

如何获得时差,有人可以帮忙吗?

【问题讨论】:

  • 预期输出是什么?只有一个Open 和一个SHRT 值?
  • @jezrael 预期输出是时间差,从OpenSHRT:在列中有超过 500 个值
  • 好的,所以请检查上面的第二个链接并创建minimal, complete, and verifiable example,将多个值OpenSHRT添加到数据样本中,并以数字形式输出。
  • @jezrael 我没问清楚吗?
  • @jezrael 好的,我会尽量改进描述

标签: python pandas datetime dataframe


【解决方案1】:

用途:

#changed data samples for better sample data 
print (df)
                 time Status
0 2016-01-15 08:55:00   Open
1 2016-01-15 09:00:00   Plmt
2 2016-01-15 09:05:00   SHRT
3 2016-01-15 09:10:00   Plmt
4 2016-01-15 09:15:00   Open
5 2016-01-15 09:20:00   Plmt
6 2016-01-15 09:25:00   SHRT
7 2016-01-15 09:30:00   SHRT
8 2016-01-15 09:35:00   Plmt
9 2016-01-15 09:40:00   SHRT

#filter only Open and SHRT
df1 = df[df['Status'].isin(['Open','SHRT'])].copy()
#convert column to datetimes
df1['time'] = pd.to_datetime(df1['time'])
print (df1)
                 time Status
0 2016-01-15 08:55:00   Open
2 2016-01-15 09:05:00   SHRT
4 2016-01-15 09:15:00   Open
6 2016-01-15 09:25:00   SHRT
7 2016-01-15 09:30:00   SHRT
9 2016-01-15 09:40:00   SHRT

#filter only rows with Open and next row SHRT
m1 = (df1['Status'] == 'Open') & (df1['Status'].shift(-1) == 'SHRT')
m2 = (df1['Status'].shift() == 'Open') & (df1['Status'] == 'SHRT')
df2 = df1[m1 | m2].copy()

#create difference column and set NaT by condition
df2['Diff'] = df2['time'].diff().mask(df2['Status'] == 'Open') 
print (df2)
                 time Status     Diff
0 2016-01-15 08:55:00   Open      NaT
2 2016-01-15 09:05:00   SHRT 00:10:00
4 2016-01-15 09:15:00   Open      NaT
6 2016-01-15 09:25:00   SHRT 00:10:00

【讨论】:

  • 应用df1['time'] = pd.to_datetime(df1['time'])后接收输出:C:\Program Files (x86)\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy """Entry point for launching an IPython kernel.
  • @ArtemReznov - 将 df1 = df[df['Status'].isin(['Open','SHRT'])] 更改为 df1 = df[df['Status'].isin(['Open','SHRT'])].copy()
  • 它有所帮助,错误消失了。做print(df2)时得到输出:35 NaT 281 NaT 282 NaT 642 NaT
  • @ArtemReznov - 你的熊猫版本是什么?
  • 是的! df2['Diff'] = df2['time'].diff().mask(df2['Status'] == 'Open') 改变了这个,得到了我想要的输出,非常感谢!!!
猜你喜欢
  • 1970-01-01
  • 2018-11-07
  • 2021-06-17
  • 2020-05-07
  • 2021-11-08
  • 1970-01-01
  • 2014-05-20
  • 1970-01-01
相关资源
最近更新 更多