【发布时间】:2016-08-12 03:56:34
【问题描述】:
我需要计算两个日期之间的小时差(格式:年-月-日THH:MM:SS 我还可以将数据格式转换为(格式:年-月-日 HH:MM:SS)从巨大的 excel 文件. 在 Python 中最有效的方法是什么?我尝试使用 Datatime/Time 对象(TypeError:预期的字符串或缓冲区)、Timestamp(ValueError)和 DataFrame(不给出小时结果)。
Excel 文件:
Order_Date Received_Customer Column3
2000-10-06T13:00:58 2000-11-06T13:00:58 1
2000-10-21T15:40:15 2000-12-27T10:09:29 2
2000-10-23T10:09:29 2000-10-26T10:09:29 3
..... ....
数据时间/时间对象代码(TypeError:预期的字符串或缓冲区):
import pandas as pd
import time as t
data=pd.read_excel('/path/file.xlsx')
s1 = (data,['Order_Date'])
s2 = (data,['Received_Customer'])
s1Time = t.strptime(s1, "%Y:%m:%d:%H:%M:%S")
s2Time = t.strptime(s2, "%Y:%m:%d:%H:%M:%S")
deltaInHours = (t.mktime(s2Time) - t.mktime(s1Time))
print deltaInHours, "hours"
时间戳(ValueError)代码:
import pandas as pd
import datetime as dt
data=pd.read_excel('/path/file.xlsx')
df = pd.DataFrame(data,columns=['Order_Date','Received_Customer'])
df.to = [pd.Timestamp('Order_Date')]
df.fr = [pd.Timestamp('Received_Customer')]
(df.fr-df.to).astype('timedelta64[h]')
DataFrame(不返回想要的结果)
import pandas as pd
data=pd.read_excel('/path/file.xlsx')
df = pd.DataFrame(data,columns=['Order_Date','Received_Customer'])
df['Order_Date'] = pd.to_datetime(df['Order_Date'])
df['Received_Customer'] = pd.to_datetime(df['Received_Customer'])
answer = df.dropna()['Order_Date'] - df.dropna()['Received_Customer']
answer.astype('timedelta64[h]')
print(answer)
输出:
0 24 days 16:38:07
1 0 days 00:00:00
2 20 days 12:39:52
dtype: timedelta64[ns]
应该是这样的:
0 592 hour
1 0 hour
2 492 hour
除了answer.astype('timedelta64[h]'),还有其他方法可以将timedelta64[ns] 转换为小时吗?
【问题讨论】:
-
你在几分钟前问过这个问题,记住...这里不是这样,问一个问题,删除它,再问一次!
-
@linusg 无意删除它!
标签: python pandas datetime dataframe timestamp