【问题标题】:Parse csv object time to datetime in python在python中将csv对象时间解析为日期时间
【发布时间】:2019-10-25 04:44:20
【问题描述】:

我有一个 csv 文件,下面有 Timestamp 列。我想将格式更改为2013-08-12 10:29:19.673 或一秒的粒度。目前Timestampobject 类型。

我可以在 excel 中手动更改其格式,但文件太大,有些行会丢失。

        Id          Timestamp Data  Group_Id
0       19929927    00:07.5   27.0  27
1       19929928    00:08.3   26.5  27
2       19929929    00:48.7   33.5  157
3       19929930    00:50.0   33.0  157
4       19929931    00:53.1   35.0  25

                 ...

1048570 20978497    10:11.9   34.5  152
1048571 20978498    10:13.3   34.0  152
1048572 20978499    10:41.2   42.0  138
1048573 20978500    10:42.5   45.0  138
1048574 20978501    10:43.9   44.0  138

【问题讨论】:

  • @jezrael 你知道怎么做吗?谢谢!

标签: python pandas csv datetime


【解决方案1】:

编辑:如果将时间转换为没有日期信息的日期时间,熊猫显然会添加实际日期。

如果需要再过几天,请查看此解决方案:

如果时间以0 开头,则想法是创建连续的日期时间:

df = df[['Timestamp']]
print (df)
   Timestamp
0    00:08.3 <- first day
1    00:48.7
2    00:50.0
3    00:53.1
4    10:11.9
5    10:13.3
6    10:41.2
7    00:50.0 <- second day
8    00:53.1
9    10:42.5
10   10:43.9
11   00:07.5 <- third day
12   00:08.3
13   10:11.9
14   10:13.3
15   10:43.9

#convert to datetimes and get hours for test 0
df['h'] = pd.to_datetime(df['Timestamp']).dt.hour
#test first 0 for start of day
df['mask'] = df['h'].shift().ne(0) & df['h'].eq(0)
#create consecutive groups - starts by 1 if first time start by 0, else start by 1 
df['g'] = df['mask'].cumsum()
#specify first day in origin parameter
df['days'] = pd.to_datetime(df['g'], origin='2016-01-01', unit='d')
#add to original Timestamps if HH:MM.SS
df['Timestamp1'] = df['days'] + pd.to_timedelta(df['Timestamp'].str.replace('\.',':'))
#add to original Timestamps if format without hours - MM:SS.SS
df['Timestamp2'] = df['days'] + pd.to_timedelta('00:' + df['Timestamp'])

print (df)
   Timestamp   h   mask  g       days          Timestamp1  \
0    00:08.3   0   True  1 2016-01-02 2016-01-02 00:08:03   
1    00:48.7   0  False  1 2016-01-02 2016-01-02 00:48:07   
2    00:50.0   0  False  1 2016-01-02 2016-01-02 00:50:00   
3    00:53.1   0  False  1 2016-01-02 2016-01-02 00:53:01   
4    10:11.9  10  False  1 2016-01-02 2016-01-02 10:11:09   
5    10:13.3  10  False  1 2016-01-02 2016-01-02 10:13:03   
6    10:41.2  10  False  1 2016-01-02 2016-01-02 10:41:02   
7    00:50.0   0   True  2 2016-01-03 2016-01-03 00:50:00   
8    00:53.1   0  False  2 2016-01-03 2016-01-03 00:53:01   
9    10:42.5  10  False  2 2016-01-03 2016-01-03 10:42:05   
10   10:43.9  10  False  2 2016-01-03 2016-01-03 10:43:09   
11   00:07.5   0   True  3 2016-01-04 2016-01-04 00:07:05   
12   00:08.3   0  False  3 2016-01-04 2016-01-04 00:08:03   
13   10:11.9  10  False  3 2016-01-04 2016-01-04 10:11:09   
14   10:13.3  10  False  3 2016-01-04 2016-01-04 10:13:03   
15   10:43.9  10  False  3 2016-01-04 2016-01-04 10:43:09   

                Timestamp2  
0  2016-01-02 00:00:08.300  
1  2016-01-02 00:00:48.700  
2  2016-01-02 00:00:50.000  
3  2016-01-02 00:00:53.100  
4  2016-01-02 00:10:11.900  
5  2016-01-02 00:10:13.300  
6  2016-01-02 00:10:41.200  
7  2016-01-03 00:00:50.000  
8  2016-01-03 00:00:53.100  
9  2016-01-03 00:10:42.500  
10 2016-01-03 00:10:43.900  
11 2016-01-04 00:00:07.500  
12 2016-01-04 00:00:08.300  
13 2016-01-04 00:10:11.900  
14 2016-01-04 00:10:13.300  
15 2016-01-04 00:10:43.900  

【讨论】:

  • 我刚刚意识到解释的日期是错误的——不是 2019 年,可能是 2016/2017 年
  • @nilsinelabore - 设置 20162017 的逻辑是什么?
  • @nilsinelabore - 答案已编辑,请检查。
  • 谢谢。抱歉,我一定忘了提到这个 csv 文件是从 sql server 导出的,它看起来像这个链接中的日期:stackoverflow.com/questions/18598075/…我不确定它是如何完成的,但我认为 sql/excel 有一种特殊的方式来解释日期。例如,00:07.5, 00:08.3, 00:48.7 = 1/12/2015 12:00:07 am, 1/12/2015 12:00:08 am, 1/12/2015 12:00:49 am
  • 我用 python 整理好了。谢谢:)
猜你喜欢
  • 2013-09-15
  • 1970-01-01
  • 2010-10-27
  • 2013-03-27
  • 1970-01-01
  • 1970-01-01
  • 2021-06-10
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多