【发布时间】:2016-04-27 00:36:16
【问题描述】:
我有以下原始数据,
TranID,TranDate,TranTime,TranAmt
A123456,20160427,02:18,9999.53
B123457,20160426,02:48,26070.33
C123458,20160425,03:18,13779.56
A123459,20160424,03:18,18157.26
B123460,20160423,04:18,215868.15
C123461,20160422,04:18,23695.25
A123462,20160421,05:18,57
B123463,20160420,05:18,64594.24
C123464,20160419,06:18,47890.91
A123465,20160427,06:18,14119.74
B123466,20160426,07:18,2649.6
C123467,20160425,07:18,16757.38
A123468,20160424,08:18,8864.78
B123469,20160423,08:18,26254.69
C123470,20160422,09:18,13206.98
A123471,20160421,09:18,15872.45
B123472,20160420,10:18,197621.18
C123473,20160419,10:18,21048.72
我尝试使用 pd read_csv 导入原始数据,
试试1
import numpy as np
import pandas as pd
df = pd.read_csv('MyTest.csv', sep=',', header=0, parse_dates=['TranDate'],
usecols=['TranID','TranDate','TranTime','TranAmt'],
engine='python')
print(df.dtypes)
df[:5]
输出1
TranID object
TranDate datetime64[ns]
TranTime object
TranAmt float64
dtype: object
Out[12]:
TranID TranDate TranTime TranAmt
0 A123456 2016-04-27 02:18 9999.53
1 B123457 2016-04-26 02:48 26070.33
2 C123458 2016-04-25 03:18 13779.56
3 A123459 2016-04-24 03:18 18157.26
4 B123460 2016-04-23 04:18 215868.15
试试2
import numpy as np
import pandas as pd
df = pd.read_csv('MyTest.csv', sep=',', header=0, parse_dates=['TranDate', 'TranTime'],
usecols=['TranID','TranDate','TranTime','TranAmt'],
engine='python')
print(df.dtypes)
df[:5]
输出2
TranID object
TranDate datetime64[ns]
TranTime datetime64[ns]
TranAmt float64
dtype: object
Out[13]:
TranID TranDate TranTime TranAmt
0 A123456 2016-04-27 2016-04-27 02:18:00 9999.53
1 B123457 2016-04-26 2016-04-27 02:48:00 26070.33
2 C123458 2016-04-25 2016-04-27 03:18:00 13779.56
3 A123459 2016-04-24 2016-04-27 03:18:00 18157.26
4 B123460 2016-04-23 2016-04-27 04:18:00 215868.15
我对 TranTime 专栏感到困惑。在 Try1 中,它显示正确,但 dtype 是对象。在 Try2 中,pd 将当前日期添加到时间,dtype 为 datetime。
我希望将此 TranTime 列视为 Time,并希望使用 pd 的 groupby 或 pivot_table 执行聚合。 如果我使用 Try1 方法,对象 dtype 会影响我的聚合吗? 如果我使用 Try2 方法,是否需要去掉 Date 部分才能使用 Time 部分?
我精通 SAS,其中 SAS 具有日期、时间和日期时间信息以及基础数据类型只是数字的格式。因此我对 Python 的 object 和 datetime dtypes 感到困惑。
谢谢, 大厅
【问题讨论】: