【问题标题】:Split timestamp column into two new columns in CSV using python and pandas使用 python 和 pandas 将时间戳列拆分为 CSV 中的两个新列
【发布时间】:2015-03-01 22:15:39
【问题描述】:

我有一个包含超过 210000 行的大型 CSV 文件。我是 python 和 pandas 的新手。我想有效地循环通过时间戳列,将时间戳列拆分为 2 个新列(日期和时间),然后将新日期列格式化为 %Y%m%d 并删除新时间列。即只写回CSV 文件新的格式化日期列。你是怎么做到的?

输入文件样本:

   minit,timestamp,open,high,low,close
   0,2009-02-23 17:32:00,1.2708,1.2708,1.2706,1.2706
   1,2009-02-23 17:33:00,1.2708,1.2708,1.2705,1.2706
   2,2009-02-23 17:34:00,1.2706,1.2707,1.2702,1.2702
   3,2009-02-23 17:35:00,1.2704,1.2706,1.27,1.27
   4,2009-02-23 17:36:00,1.2701,1.2706,1.2698,1.2703
   5,2009-02-23 17:37:00,1.2703,1.2703,1.27,1.2702
   6,2009-02-23 17:38:00,1.2701,1.2701,1.2696,1.2697

输出文件示例:

   minit,date,open,high,low,close
   0,20090223,1.2708,1.2708,1.2706,1.2706
   1,20090223,1.2708,1.2708,1.2705,1.2706
   2,20090223,1.2706,1.2707,1.2702,1.2702
   3,20090223,1.2704,1.2706,1.27,1.27
   4,20090223,1.2701,1.2706,1.2698,1.2703
   5,20090223,1.2703,1.2703,1.27,1.2702
   6,20090223,1.2701,1.2701,1.2696,1.2697

我在谷歌上搜索后开始编写示例代码来完成此操作:

     import csv
     import itertools
     import operator
     import time
     import datetime
     import pandas as pd
     from pandas import DataFrame, Timestamp
     from numpy import *

     def datestring_to_timestamp(str):
         return time.mktime(time.strptime(str, "%Y-%m-%d %H:%M:%S"))

     def timestamp_to_datestring(timestamp):
        return time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(timestamp))

     def timestamp_to_float(str):
        return float(datetime.datetime.strptime(str, '%Y-%m-%d %H:%M:%S').strftime("%s"))

     def timestamp_to_intstring(str):
        return datetime.datetime.strptime(str, '%Y-%m-%d %H:%M:%S').strftime("%s")

    def timestamp_to_int(str):
        return int(datetime.datetime.strptime(str, '%Y-%m-%d %H:%M:%S').strftime("%s"))

    with open("inputfile.csv", 'rb') as input, open('outputfile.csv', 'wb') as output:
       reader = csv.reader(input, delimiter = ',')
       writer = csv.writer(output, delimiter = ',')

    # Need to process loop or process the timestamp column 

【问题讨论】:

  • 在 col 中转换/读取为 datetime64 列后,您可以像这样创建日期 col:df['date'] = df['timestamp'].dt.date
  • 另外,to_csv 方法接受格式参数,您可以传递格式字符串以将日期写为

标签: python csv numpy pandas itertools


【解决方案1】:

您可以在to_csv 的参数中指定日期格式字符串,它将以您喜欢的方式输出您的日期,无需提取/转换/添加新列等。

所以使用read_csv加载数据:

df = pd.read_csv('mydata.csv', parse_dates=['timestamp']

In [15]:

df
Out[15]:
   minit           timestamp    open    high     low   close
0      0 2009-02-23 17:32:00  1.2708  1.2708  1.2706  1.2706
1      1 2009-02-23 17:33:00  1.2708  1.2708  1.2705  1.2706
2      2 2009-02-23 17:34:00  1.2706  1.2707  1.2702  1.2702
3      3 2009-02-23 17:35:00  1.2704  1.2706  1.2700  1.2700
4      4 2009-02-23 17:36:00  1.2701  1.2706  1.2698  1.2703
5      5 2009-02-23 17:37:00  1.2703  1.2703  1.2700  1.2702
6      6 2009-02-23 17:38:00  1.2701  1.2701  1.2696  1.2697

您可以在此阶段重命名列,然后我们可以传递参数date_format='%Y%m%d' toto_csv`,这只会将日期部分输出到 csv,我们可以重新加载它并显示它保存的内容:

In [19]:

df.rename(columns={'timestamp':'date'},inplace=True)
df.to_csv(r'c:\data\date.csv', date_format='%Y%m%d')
df1 = pd.read_csv(r'C:\data\date.csv', index_col=[0])
df1
Out[19]:
   minit      date    open    high     low   close
0      0  20090223  1.2708  1.2708  1.2706  1.2706
1      1  20090223  1.2708  1.2708  1.2705  1.2706
2      2  20090223  1.2706  1.2707  1.2702  1.2702
3      3  20090223  1.2704  1.2706  1.2700  1.2700
4      4  20090223  1.2701  1.2706  1.2698  1.2703
5      5  20090223  1.2703  1.2703  1.2700  1.2702
6      6  20090223  1.2701  1.2701  1.2696  1.2697

【讨论】:

    猜你喜欢
    • 2016-12-07
    • 1970-01-01
    • 2019-01-09
    • 1970-01-01
    • 2016-06-06
    • 1970-01-01
    相关资源
    最近更新 更多