【问题标题】:Error in Pandas calculate time difference in secondsPandas 中的错误以秒为单位计算时差
【发布时间】:2018-07-13 12:26:11
【问题描述】:

我正在尝试通过 Panda 帧提取时间差(秒)。我通过文本文件读取数据。但是我在应用 diff 函数时对数据进行分组后出现错误。

#load data
# this format loads file when there is a 'tab' delimiter in the text file
data = pd.read_csv(file, sep='\t', lineterminator='\n')

# filter data by desired field, traded venues are XLON_SET1, _BATE, _CHIX, _TRQX, XOFF_SET1 etc
dataFil = data[data['VENUE'] == "XLON_SET1"]
# then we need to group them by time-stamp to be sure, to clean up the time-series. This will cause TIME_STAMP and PRICE to become index instead of columns with data
dataFil = dataFil.groupby(['TIME_STAMP', 'PRICE']).sum()
#dataFil = dataFil.groupby(['TIME_STAMP']).sum()

dataFil['date'] = dataFil.index.get_level_values('TIME_STAMP')
dataFil['PRICE'] = dataFil.index.get_level_values('PRICE')
dataFil.head() #or dataFil

我得到以下数据

QUANTITY BID ASK MKT_BID MKT_ASK 日期 PRICE TIME_STAMP PRICE
2018-01-22 08:30:01.306 2.769 3409 0.0 0.0 0.0 0.0 2018-01-22 08:30:01.306 2.769 2018-01-22 08:30:04.306 2.769 2691 0.0 0.0 0.0 0.0 2018-01-22 08:30:04.306 2.769 2018-01-22 08:30:11.306 2.769 2000 0.0 0.0 0.0 0.0 2018-01-22 08:30:11.306 2.769 2018-01-22 08:30:51.065 2.769 572 0.0 0.0 0.0 0.0 2018-01-22 08:30:51.065 2.769 2018-01-22 08:31:26.068 2.768 649 0.0 0.0 0.0 0.0 2018-01-22 08:31:26.068 2.768

但是当我使用时:(检查了这个线程:Pandas calculate time difference

df = dataFil
df.assign(seconds=df.date.diff().dt.seconds)

我有以下错误

TypeError                                 Traceback (most recent call last)
<ipython-input-170-3be32e0aad41> in <module>()
      1 df = dataFil
----> 2 df.assign(seconds=df.date.diff().dt.seconds)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py in diff(self, periods)
   1525         diffed : Series
   1526         """
-> 1527         result = algorithms.diff(_values_from_object(self), periods)
   1528         return self._constructor(result, index=self.index).__finalize__(self)
   1529 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\algorithms.py in diff(arr, n, axis)
   1545             out_arr[res_indexer] = result
   1546         else:
-> 1547             out_arr[res_indexer] = arr[res_indexer] - arr[lag_indexer]
   1548 
   1549     if is_timedelta:

TypeError: unsupported operand type(s) for -: 'str' and 'str'

【问题讨论】:

  • 首先使用 pd.to_datetime 将列转换为日期时间。目前它被视为字符串类型。
  • 似乎不起作用。我使用了以下内容: pd.to_datetime(dataFil['date']) #, format='%Y-%b-%d:%H:%M:%S.%f' df = dataFil df.assign(seconds =df.date.diff().dt.seconds) df
  • 然后我得到错误:C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\algorithms.py in diff(arr, n, axis) 1545 out_arr[res_indexer] = result 1546 else: -> 1547 out_arr[res_indexer] = arr[res_indexer] - arr[lag_indexer] 1548 1549 if is_timedelta: TypeError: 不支持的操作数类型 -: 'str' 和 'str'
  • 您是否将其分配回数据框? df['col']=pd.to_datetime(df['col'])?

标签: python-3.x pandas time-series


【解决方案1】:

我认为需要将列 date 转换为 datetimes - 最好在 read_csv 中:

data = pd.read_csv(file, sep='\t', lineterminator='\n', paarse_dates=['TIME_STAMP'])

或者通过to_datetime转换列:

df.assign(seconds=pd.to_datetime(df.date).diff().dt.seconds)

【讨论】:

  • 它有效吗?我尝试从问题中复制数据,但失败了:(
  • 这似乎有效:df = dataFil df.assign(seconds=pd.to_datetime(df.date).diff(1).dt.seconds).head()
  • *** 但是 ***,疯狂的是,当我再次重新加载 df 时,'seconds' 列消失了....df = df.assign(seconds=pd.to_datetime(df.date) .diff(1).dt.seconds).head() .... 这就是答案!!!
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2021-08-25
  • 2014-02-06
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2018-07-30
相关资源
最近更新 更多