pandas之时间数据

1.时间戳Timestamp()

参数可以为各种形式的时间，Timestamp()会将其转换为时间。

time1 = pd.Timestamp(\'2019/7/13\')
time2 = pd.Timestamp(\'13/7/2019 13:05\')
time3 - pd.Timestamp(\'2019-7-13\')
time4 = pd.Timestamp(\'2019 7 13 13:05\')
time5 = pd.Timestamp(\'2019 July 13 13\')
time6 = pd.Timestamp(datetime.datetime(2019,7,13,13,5))
print(datetime.datetime.now(),type(datetime.datetime.now()))
print(time1,type(time1))
print(time2)
print(time3)
print(time4)
print(time5)
print(time6)
# 2019-07-25 14:33:20.482696 <class \'datetime.datetime\'>
# 2019-07-13 00:00:00 <class \'pandas._libs.tslibs.timestamps.Timestamp\'>
# 2019-07-13 13:05:00
# 2019-07-13 00:00:00
# 2019-07-13 13:05:00
# 2019-07-13 13:00:00
# 2019-07-13 13:05:00

Timestamp()

2.to_datetime()时间戳和时间序列

对于单个时间的转换，与timestamp()的用法相同，将各种形式的时间参数转换为时间。

time1 = pd.to_datetime(\'2019/7/13\')
time2 = pd.to_datetime(\'13/7/2019 13:05\')
time3 = pd.to_datetime(datetime.datetime(2019,7,13,13,5))
print(datetime.datetime.now(),type(datetime.datetime.now()))
print(time1,type(time1))
print(time2)
print(time3)
# 2019-07-23 22:33:56.650290 <class \'datetime.datetime\'>
# 2019-07-13 00:00:00 <class \'pandas._libs.tslibs.timestamps.Timestamp\'>
# 2019-07-13 13:05:00
# 2019-07-13 13:05:00

to_datetime()处理单个时间

对于多个时间的处理，Timestamp()无法使用，而to_datetime()可以处理成时间序列

timelist = [\'2019/7/13\',\'13/7/2019 13:05\',datetime.datetime(2019,7,13,13,5)]
t = pd.to_datetime(timelist)
print(t)
print(type(t))
# DatetimeIndex([\'2019-07-13 00:00:00\', \'2019-07-13 13:05:00\',\'2019-07-13 13:05:00\'],
#               dtype=\'datetime64[ns]\', freq=None)
# <class \'pandas.core.indexes.datetimes.DatetimeIndex\'>

to_datetime()处理时间序列

3.DatetimeIndex时间序列

一个时间序列，可通过索引获取值。

t1 = pd.DatetimeIndex([\'2019/7/13\',\'13/7/2019 13:05\',datetime.datetime(2019,7,13,18,5)])
print(t1,type(t1))
print(t1[1])
# DatetimeIndex([\'2019-07-13 00:00:00\', \'2019-07-13 13:05:00\',
#                \'2019-07-13 18:05:00\'],
#               dtype=\'datetime64[ns]\', freq=None) <class \'pandas.core.indexes.datetimes.DatetimeIndex\'>
# 2019-07-13 13:05:00

DatetimeIndex

4.TimeSeries

索引为DatetimeIndex的Series

v = [\'a\',\'b\',\'c\']
t = pd.DatetimeIndex([\'2019/7/13\',\'13/7/2019 13:05\',datetime.datetime(2019,7,13,18,5)])
s = pd.Series(v,index = t,name=\'s\')
print(s)
# 2019-07-13 00:00:00    a
# 2019-07-13 13:05:00    b
# 2019-07-13 18:05:00    c
# Name: s, dtype: object

TimeSeries

重置频率asfreq(\'新频率\',method)

表示对原TimeSeris索引重新划分频率，重置索引后如果出现新的索引，method默认为None表示对应的值为NaN，ffill和bfill分别表示用前面、后面的值填充。

t = pd.date_range(\'2019/1/3\',\'2019/1/5\')
arr = pd.Series(np.arange(3),index=t)
print(arr)
print(\'----------------------------\')
print(arr.asfreq(\'8H\'))
print(\'----------------------------\')
print(arr.asfreq(\'8H\',method=\'bfill\'))
# 2019-01-03    0
# 2019-01-04    1
# 2019-01-05    2
# Freq: D, dtype: int32
# ----------------------------
# 2019-01-03 00:00:00    0.0
# 2019-01-03 08:00:00    NaN
# 2019-01-03 16:00:00    NaN
# 2019-01-04 00:00:00    1.0
# 2019-01-04 08:00:00    NaN
# 2019-01-04 16:00:00    NaN
# 2019-01-05 00:00:00    2.0
# Freq: 8H, dtype: float64
# ----------------------------
# 2019-01-03 00:00:00    0
# 2019-01-03 08:00:00    1
# 2019-01-03 16:00:00    1
# 2019-01-04 00:00:00    1
# 2019-01-04 08:00:00    2
# 2019-01-04 16:00:00    2
# 2019-01-05 00:00:00    2
# Freq: 8H, dtype: int32

时间序列的asfreq()

移位shift(n,freq,fill_value)

如果只有参数n，表示索引不变而将值进行移动，正数表示向后移动，负数表示向前移动，移动后出现的空值用fill_value填充，默认为NaN。

如果指定了n和freq，表示将索引按照指定的freq进行加法或减法，而值不变。

t = pd.date_range(\'2019/1/3\',\'2019/1/5\')
arr = pd.Series([15,16,14],index=t)
print(arr)
print(\'-----------------------\')
print(arr.shift(1,fill_value=\'haha\'))#移动后第一个索引没有对应的值，以haha填充
print(\'-----------------------\')
print(arr.shift(-1))#移动后最后一个索引没有对应的值，默认为NaN
# 2019-01-03    15
# 2019-01-04    16
# 2019-01-05    14
# Freq: D, dtype: int64
# -----------------------
# 2019-01-03    haha
# 2019-01-04      15
# 2019-01-05      16
# Freq: D, dtype: object
# -----------------------
# 2019-01-03    16.0
# 2019-01-04    14.0
# 2019-01-05     NaN
# Freq: D, dtype: float64

shift()移动值

t = pd.date_range(\'2019/1/3\',\'2019/1/5\')
arr = pd.Series([15,16,14],index=t)
print(arr)
print(\'-----------------------\')
print(arr.shift(2,freq=\'D\'))
print(\'-----------------------\')
print(arr.shift(-2,freq=\'H\'))
# 2019-01-03    15
# 2019-01-04    16
# 2019-01-05    14
# Freq: D, dtype: int64
# -----------------------
# 2019-01-05    15
# 2019-01-06    16
# 2019-01-07    14
# Freq: D, dtype: int64
# -----------------------
# 2019-01-02 22:00:00    15
# 2019-01-03 22:00:00    16
# 2019-01-04 22:00:00    14
# Freq: D, dtype: int64

shift()移动索引

5.date_range()和bdate_range()

生成时间范围，类型为DatetimeIndex，date_range()是生成自然日，bdate_range()是生成工作日，下面以date_range()为例。

使用方法：date_range(start,end,periods,freq,closed,normalize,name,tz)

start：时间起始点

end：时间结束点

periods：生成的时间个数

freq：频率，默认为D日历天，其他Y、M、B、H、T/MIN、S、L、U分别表示年、月、工作日、小时、分、秒、毫秒、微妙（不区分大小写）

　　其他参数：W-MON表示从每周的周几开始，WOM-2MON表示每月的周几开始

closed：默认为None，表示包括起始点和结束点，left表示包括起始点，right表示包括终端

normalize：默认为false，True表示将时刻设置为0:00:00

name：时间范围的名称

tz：时区

t1 = pd.date_range(\'2000/1/5\',\'2003/1/5\',freq=\'y\')
t2 = pd.date_range(\'2000/1/1\',\'2000/3/5\',freq=\'m\')
t3 = pd.date_range(\'2000/1/1\',\'2000/1/10\',periods=3)
t4 = pd.date_range(\'2000/1/1 12\',\'2000/1/1 15\',freq=\'h\')
t5 = pd.date_range(\'2000/1/1 12\',\'2000/1/1 15\',freq=\'h\',closed=\'left\',name=\'t3\')
t6 = pd.date_range(start = \'2000/1/1 11:00:00\',periods=3)
t7 = pd.date_range(end = \'2000/1/1 12:00:00\',periods=3)
print(t1)
print(t2)
print(t3)
print(t4)
print(t5)
print(t6)
print(t7)
# DatetimeIndex([\'2000-12-31\', \'2001-12-31\', \'2002-12-31\'], dtype=\'datetime64[ns]\', freq=\'A-DEC\')
# DatetimeIndex([\'2000-01-31\', \'2000-02-29\'], dtype=\'datetime64[ns]\', freq=\'M\')
# DatetimeIndex([\'2000-01-01 00:00:00\', \'2000-01-05 12:00:00\', \'2000-01-10 00:00:00\'],
#               dtype=\'datetime64[ns]\', freq=None)
# DatetimeIndex([\'2000-01-01 12:00:00\', \'2000-01-01 13:00:00\', \'2000-01-01 14:00:00\', \'2000-01-01 15:00:00\'],
#               dtype=\'datetime64[ns]\', freq=\'H\')
# DatetimeIndex([\'2000-01-01 12:00:00\', \'2000-01-01 13:00:00\', \'2000-01-01 14:00:00\'],
#               dtype=\'datetime64[ns]\', name=\'t3\', freq=\'H\')
# DatetimeIndex([\'2000-01-01 11:00:00\', \'2000-01-02 11:00:00\', \'2000-01-03 11:00:00\'],
#               dtype=\'datetime64[ns]\', freq=\'D\')
# DatetimeIndex([\'1999-12-30 12:00:00\', \'1999-12-31 12:00:00\', \'2000-01-01 12:00:00\'],
#               dtype=\'datetime64[ns]\', freq=\'D\')

date_range()

6.Period()时期

Period(\'date\',freq = \'*\')：默认的频率freq为传入时间的最小单位，例如传入时间的形式最小月份，那么默认频率为月，如果传入时间的形式最小单位为分钟，那么默认频率为分。

下面例子中的p4，设置频率为2M即2个工作日，那么对于p4来说的1个单位就相当于2M，所以p4+3就是p4+3*2M

p1 = pd.Period(\'2017\')
p2 = pd.Period(\'2017\',freq = \'M\')
print(p1,type(p1),p1+1)
print(p2,p2+1)
p3 = pd.Period(\'2017-1-1\')
p4 = pd.Period(\'2017-1-1\',freq = \'2M\')
print(p3,p3+2)
print(p4,p4+3)
p5 = pd.Period(\'2017-1-1 13:00\')
p6 = pd.Period(\'2017-1-1 13:00\',freq = \'5T\')
print(p5,p5+4)
print(p6,p6+5)
# 2017 <class \'pandas._libs.tslibs.period.Period\'> 2018
# 2017-01 2017-02
# 2017-01-01 2017-01-03
# 2017-01 2017-07
# 2017-01-01 13:00 2017-01-01 13:04
# 2017-01-01 13:00 2017-01-01 13:25

Period()

7.period_range()

时期范围，类型为PeriodIndex，用法类似date_range()。

p = pd.period_range(\'2000/1/1\',\'2000/1/2\',freq=\'6H\')
print(p,type(p))
# PeriodIndex([\'2000-01-01 00:00\', \'2000-01-01 06:00\', \'2000-01-01 12:00\',\'2000-01-01 18:00\', \'2000-01-02 00:00\'],
#             dtype=\'period[6H]\', freq=\'6H\') <class \'pandas.core.indexes.period.PeriodIndex\'>

period_range()

period和period_range()的asfreq，默认显示freq中的最后一个值，如果指定how=\'start\'则显示freq中的第一个值。

p1 = pd.Period( \'2019/5/1\')   #2019-05-01
p2 = p1.asfreq(\'H\')  #2019-05-01 23:00
p3 = p1.asfreq(\'2H\',how=\'start\')  #2019-05-01 00:00，频率设置为2M的2并不起作用
p4 = p1.asfreq(\'S\')  #2019-05-01 23:59:59
p5 = p1.asfreq(\'S\',how=\'start\')  #2019-05-01 00:00:00

Period()的asfreq

p = pd.period_range(\'2015/3\',\'2015/6\',freq=\'M\')
ps1 = pd.Series(np.random.rand(len(p)),index=p.asfreq(\'D\'))
ps2 = pd.Series(np.random.rand(len(p)),index=p.asfreq(\'D\',how=\'start\'))
print(p)
print(ps1)
print(\'--------------------------\')
print(ps2)
# PeriodIndex([\'2015-03\', \'2015-04\', \'2015-05\', \'2015-06\'], dtype=\'period[M]\', freq=\'M\')
# 2015-03-31    0.708730
# 2015-04-30    0.238101
# 2015-05-31    0.793451
# 2015-06-30    0.584621
# Freq: D, dtype: float64
# --------------------------
# 2015-03-01    0.397659
# 2015-04-01    0.032417
# 2015-05-01    0.763550
# 2015-06-01    0.129498
# Freq: D, dtype: float64

period_range()的asfreq

8.to_timestamp()和to_period()

时间戳和时期的转化.

p1 = pd.date_range(\'2015/3\',\'2015/6\',freq=\'M\')
p2 = pd.period_range(\'2015/3\',\'2015/6\',freq=\'M\')
ps1 = pd.Series(np.random.rand(len(p1)),index=p1)
ps2 = pd.Series(np.random.rand(len(p2)),index=p2)
print(ps1)
print(\'---------------\')
print(ps2)
print(\'---------------\')
print(ps1.to_period())
print(\'---------------\')
print(ps2.to_timestamp())
# 2015-03-31    0.066644
# 2015-04-30    0.159969
# 2015-05-31    0.111716
# Freq: M, dtype: float64
# ---------------
# 2015-03    0.966091
# 2015-04    0.779257
# 2015-05    0.953817
# 2015-06    0.765121
# Freq: M, dtype: float64
# ---------------
# 2015-03    0.066644
# 2015-04    0.159969
# 2015-05    0.111716
# Freq: M, dtype: float64
# ---------------
# 2015-03-01    0.966091
# 2015-04-01    0.779257
# 2015-05-01    0.953817
# 2015-06-01    0.765121
# Freq: MS, dtype: float64

to_timestamp()和to_period()

9.时间序列索引

可通过下标和标签进行索引，标签可以为各种形式的时间.

p = pd.Series(np.random.rand(4),pd.period_range(\'2015/3\',\'2015/6\',freq=\'M\'))
print(p)
print(p[0])
print(p.iloc[1])
print(p.loc[\'2015/5\'])
print(p.loc[\'2015-5\'])
print(p.loc[\'201505\'])
# 2015-03    0.846543
# 2015-04    0.631335
# 2015-05    0.218029
# 2015-06    0.646544
# Freq: M, dtype: float64
# 0.846543180730373
# 0.6313347971612441
# 0.21802886896115137
# 0.21802886896115137
# 0.21802886896115137

时间序列索引

p = pd.Series(np.random.rand(5),index=pd.period_range(\'2015/5/30\',\'2015/6/3\'))
print(p)
print(p[0:2])  #下标索引，末端不包含
print(p.iloc[0:2]) #下标索引，末端不包含
print(p.loc[\'2015/5/30\':\'2015/6/1\']) #标签索引，两端包含
print(p[\'2015/5\'])  #只传入月份，会将序列中在此月份中的行全部显示
# 2015-05-30    0.976255
# 2015-05-31    0.671226
# 2015-06-01    0.888682
# 2015-06-02    0.875901
# 2015-06-03    0.953603
# Freq: D, dtype: float64
# 2015-05-30    0.976255
# 2015-05-31    0.671226
# Freq: D, dtype: float64
# 2015-05-30    0.976255
# 2015-05-31    0.671226
# Freq: D, dtype: float64
# 2015-05-30    0.976255
# 2015-05-31    0.671226
# 2015-06-01    0.888682
# Freq: D, dtype: float64
# 2015-05-30    0.976255
# 2015-05-31    0.671226
# Freq: D, dtype: float64

时间序列切片

10.唯一unique()

is_unique判断序列的值是否唯一，index.is_unique判断标签是否唯一。

对于时间序列的索引，如果时间序列不重复，取单个时间对应的值的结果为一个数值。

而如果时间序列有重复，取无重复时间的结果仍为序列，如果取有重复的时间的值，默认会将所有符合条件的结果显示出来，可使用groupby进行分组。

p = pd.Series(np.random.rand(5),index=pd.DatetimeIndex([\'2019/5/1\',\'2019/5/2\',\'2019/5/3\',\'2019/5/1\',\'2019/5/2\']))
print(p)
print(p.is_unique,p.index.is_unique)
print(\'--------------------\')
print(p[\'2019/5/3\'])
print(\'--------------------\')
print(p[\'2019/5/1\'])
print(\'--------------------\')
print(p[\'2019/5/1\'].groupby(level=0).mean())#对标签为2019/5/1按x轴分组，值取两者的平均值
# 2019-05-01    0.653468
# 2019-05-02    0.116834
# 2019-05-03    0.978432
# 2019-05-01    0.724633
# 2019-05-02    0.250191
# dtype: float64
# True False
# --------------------
# 2019-05-03    0.978432
# dtype: float64
# --------------------
# 2019-05-01    0.653468
# 2019-05-01    0.724633
# dtype: float64
# --------------------
# 2019-05-01    0.689051
# dtype: float64

重复时间索引

时间重采样

通过resample(\'新频率\')进行重采样，结果是一个对象，需要通过sum()、mean()、max()、min()、median()、first()、last()、ohlc()（经济，开盘、最高、最低、收盘）显示

将时间序列从一个频率转换为另一个频率的过程，会有数据的填充或结合。

降采样：高频数据→低频数据，例如以天为频率的数据转换为以月为频率的数据，会有数据的结合。

升采样：低频数据→高频数据，例如以年为频率的数据转换为以月为频率的数据，会有数据的填充。

ts = pd.Series(np.arange(1,9),index=pd.date_range(start = \'2019/5/1\',periods=8))
print(ts)
print(\'重采样：\',ts.resample(\'3D\'),\' 数据类型\',type(ts.resample(\'3D\')))
print(\'重采样和值：\',type(ts.resample(\'3D\').sum()),\'\n\',ts.resample(\'3D\').sum())
print(\'重采样均值：\n\',ts.resample(\'3D\').mean())
print(\'重采样最大值：\n\',ts.resample(\'3D\').max())
print(\'重采样最小值：\n\',ts.resample(\'3D\').min())
print(\'重采样中值：\n\',ts.resample(\'3D\').median())
print(\'重采样第一个：\n\',ts.resample(\'3D\').first())
print(\'重采样最后一个：\n\',ts.resample(\'3D\').last())
print(\'OHLC重采样：\n\',ts.resample(\'3D\').ohlc())
# 2019-05-01    1
# 2019-05-02    2
# 2019-05-03    3
# 2019-05-04    4
# 2019-05-05    5
# 2019-05-06    6
# 2019-05-07    7
# 2019-05-08    8
# Freq: D, dtype: int32
# 重采样： DatetimeIndexResampler [freq=<3 * Days>, axis=0, closed=left, label=left, convention=start, base=0] 数据类型<class \'pandas.core.resample.DatetimeIndexResampler\'>
# 重采样和值： <class \'pandas.core.series.Series\'> 
#  2019-05-01     6
# 2019-05-04    15
# 2019-05-07    15
# Freq: 3D, dtype: int32
# 重采样均值：
#  2019-05-01    2.0
# 2019-05-04    5.0
# 2019-05-07    7.5
# Freq: 3D, dtype: float64
# 重采样最大值：
#  2019-05-01    3
# 2019-05-04    6
# 2019-05-07    8
# Freq: 3D, dtype: int32
# 重采样最小值：
#  2019-05-01    1
# 2019-05-04    4
# 2019-05-07    7
# Freq: 3D, dtype: int32
# 重采样中值：
#  2019-05-01    2.0
# 2019-05-04    5.0
# 2019-05-07    7.5
# Freq: 3D, dtype: float64
# 重采样第一个：
#  2019-05-01    1
# 2019-05-04    4
# 2019-05-07    7
# Freq: 3D, dtype: int32
# 重采样最后一个：
#  2019-05-01    3
# 2019-05-04    6
# 2019-05-07    8
# Freq: 3D, dtype: int32
# OHLC重采样：
#              open  high  low  close
# 2019-05-01     1     3    1      3
# 2019-05-04     4     6    4      6
# 2019-05-07     7     8    7      8

重采样resample()示例

对于降采样，如果resample()中设置参数closed=\'right\'，则指定间隔右边为结束，默认是采用left间隔左边为结束。【不是很明白】

ts = pd.Series(np.arange(1,9),index=pd.date_range(start = \'2019/5/1\',periods=8))
print(ts.resample(\'3D\').sum())   
print(ts.resample(\'3D\',closed=\'right\').sum())
\'\'\'[1,2,3],[4,5,6],[7,8]]\'\'\'
\'\'\'[(29,30)1],[2,3,4],[5,6,7],[8]\'\'\'
# 2019-05-01     6
# 2019-05-04    15
# 2019-05-07    15
# Freq: 3D, dtype: int32
# 2019-04-28     1
# 2019-05-01     9
# 2019-05-04    18
# 2019-05-07     8
# Freq: 3D, dtype: int32

重采样左右结束

对于降采样，如果resample()中设置lable=\'right\'，表示显示的标签为下一组里面的第一个标签，默认为当前分组的第一个标签。

ts = pd.Series(np.arange(1,9),index=pd.date_range(start = \'2019/5/1\',periods=8))
print(ts.resample(\'3D\').sum())   #显示的标签为当前分组中的第一个标签
print(ts.resample(\'3D\',label=\'right\').sum())  #显示的标签为下一个分组中的第一个标签
#按照3D重采样，分组[1,2,3] [4,5,6] [7,8,9]
# 2019-05-01     6
# 2019-05-04    15
# 2019-05-07    15
# Freq: 3D, dtype: int32
# 2019-05-04     6
# 2019-05-07    15
# 2019-05-10    15
# Freq: 3D, dtype: int32

降采样显示标签

对于升采样，由于会增加标签，因此会出现空值问题，bfill()使用后面的值填充空值，ffill()使用前面的值填充空值。

ts = pd.Series(np.arange(1,4),index=pd.date_range(start = \'2019/5/1\',periods=3))
print(ts)
print(ts.resample(\'12H\'))  #对象
print(ts.resample(\'12H\').asfreq())   #使用NaN填充空值
print(ts.resample(\'12H\').bfill())    #使用后面的值填充空值
print(ts.resample(\'12H\').ffill())   #使用前面的值填充空值
# 2019-05-01    1
# 2019-05-02    2
# 2019-05-03    3
# Freq: D, dtype: int32
# DatetimeIndexResampler [freq=<12 * Hours>, axis=0, closed=left, label=left, convention=start, base=0]
# 2019-05-01 00:00:00    1.0
# 2019-05-01 12:00:00    NaN
# 2019-05-02 00:00:00    2.0
# 2019-05-02 12:00:00    NaN
# 2019-05-03 00:00:00    3.0
# Freq: 12H, dtype: float64
# 2019-05-01 00:00:00    1
# 2019-05-01 12:00:00    2
# 2019-05-02 00:00:00    2
# 2019-05-02 12:00:00    3
# 2019-05-03 00:00:00    3
# Freq: 12H, dtype: int32
# 2019-05-01 00:00:00    1
# 2019-05-01 12:00:00    1
# 2019-05-02 00:00:00    2
# 2019-05-02 12:00:00    2
# 2019-05-03 00:00:00    3
# Freq: 12H, dtype: int32

升采样填充值