【问题标题】:DateTime format changes after merging Pandas DataFrame合并 Pandas DataFrame 后 DateTime 格式发生变化
【发布时间】:2017-03-16 22:07:21
【问题描述】:

我有两个数据框,其中包含一个名为time 的列,其中包含时间的日期时间表示,以及一个变量列。我想合并这两个数据框,但由于某种原因,这弄乱了nn 的日期时间格式。

我使用这段代码创建了单独的数据框:

## ECG load
nn = pd.read_csv('D:\\path\\Nn.csv',delimiter=";",decimal=',',header=None,names=["time","ibi"])
fsEcg = 1024 # Sample frequency
tsEcg = mkdatMovis('2016-10-31T12:16:15.015') #datetime rep of Start time string
nn.loc[:,'time'] = nn.time/fsEcg # convert sample number to seconds
ecgTime = zip(tsEcg + datetime.timedelta(seconds=float(cmt)) for cmt in nn.time)
nn.loc[:,'time'] = ecgTime

## EDA load
eda = pd.read_csv('D:\\path\\eda.csv',\
                  delimiter=";",decimal=',',header=None,names=["eda"])
fsEda = 32
tsEda = mkdatMovis('2016-10-31T12:17:08.363')
cumEda = np.arange(len(eda),dtype=np.float64)/fsEda  # create time array in seconds
cumEda = pd.Series(cumEda)
edadat = pd.DataFrame()
edadat.loc[:,'time'] = zip(tsEda + datetime.timedelta(seconds=float(cmt)) for cmt in cumEda)
edadat.loc[:,'eda'] = eda

数据框如下:

>>> nn
                           time           nn
0    2016-10-31 12:16:26.409531   972.656250
1    2016-10-31 12:16:27.394883   985.351562
2    2016-10-31 12:16:28.379258   984.375000
3    2016-10-31 12:16:29.360703   981.445312
4    2016-10-31 12:16:30.407578  1046.875000
...
1448 2016-10-31 12:39:37.910508   845.703125

>>> edadat
                                time   eda
0      (2016-10-31 12:17:08.363000,)   2.0
1      (2016-10-31 12:17:08.363000,)   5.0
2      (2016-10-31 12:17:08.363000,)   5.0
3      (2016-10-31 12:17:08.363000,)   4.0
4      (2016-10-31 12:17:08.363000,)   4.0
....
41582  (2016-10-31 12:38:47.363000,)  36.0

将数据框与df = edadat.merge(nn,on="time",how="outer") 合并后,数据如下所示:

                                time  eda           nn
0      (2016-10-31 12:17:08.363000,)  2.0          NaN
1      (2016-10-31 12:17:08.363000,)  5.0          NaN
2      (2016-10-31 12:17:08.363000,)  5.0          NaN
3      (2016-10-31 12:17:08.363000,)  4.0          NaN
4      (2016-10-31 12:17:08.363000,)  4.0          NaN
...
43027            1477917574356797000  NaN   928.710938
43028            1477917575276719000  NaN   919.921875
43029            1477917576178086000  NaN   901.367188
43030            1477917577064805000  NaN   886.718750
43031            1477917577910508000  NaN   845.703125

为什么日期时间形式nn合并后会转成unix?难道我不是用完全相同的代码来创建时间序列吗?

【问题讨论】:

    标签: python python-2.7 datetime pandas merge


    【解决方案1】:

    我认为tuplestime 列中存在问题,因此您需要通过str[0] 删除元组 - 选择DataFrame 行中每个元组中的第一个元素:

    edadat.time = edadat.time.str[0]
    print (edadat)
                                 time   eda
    0      2016-10-31 12:17:08.363000   2.0
    1      2016-10-31 12:17:08.363000   5.0
    2      2016-10-31 12:17:08.363000   5.0
    3      2016-10-31 12:17:08.363000   4.0
    4      2016-10-31 12:17:08.363000   4.0
    41582  2016-10-31 12:38:47.363000  36.0
    

    然后使用:

    df = edadat.merge(nn,on="time",how="outer")
    print (df)
                             time   eda           nn
    0  2016-10-31 12:17:08.363000   2.0          NaN
    1  2016-10-31 12:17:08.363000   5.0          NaN
    2  2016-10-31 12:17:08.363000   5.0          NaN
    3  2016-10-31 12:17:08.363000   4.0          NaN
    4  2016-10-31 12:17:08.363000   4.0          NaN
    5  2016-10-31 12:38:47.363000  36.0          NaN
    6  2016-10-31 12:16:26.409531   NaN   972.656250
    7  2016-10-31 12:16:27.394883   NaN   985.351562
    8  2016-10-31 12:16:28.379258   NaN   984.375000
    9  2016-10-31 12:16:29.360703   NaN   981.445312
    10 2016-10-31 12:16:30.407578   NaN  1046.875000
    11 2016-10-31 12:39:37.910508   NaN   845.703125
    

    但我认为最好使用merge_ordered:

    df1 = pd.merge_ordered(edadat, nn,on="time",how="outer")
    print (df1)
                             time   eda           nn
    0  2016-10-31 12:16:26.409531   NaN   972.656250
    1  2016-10-31 12:16:27.394883   NaN   985.351562
    2  2016-10-31 12:16:28.379258   NaN   984.375000
    3  2016-10-31 12:16:29.360703   NaN   981.445312
    4  2016-10-31 12:16:30.407578   NaN  1046.875000
    5  2016-10-31 12:17:08.363000   2.0          NaN
    6  2016-10-31 12:17:08.363000   5.0          NaN
    7  2016-10-31 12:17:08.363000   5.0          NaN
    8  2016-10-31 12:17:08.363000   4.0          NaN
    9  2016-10-31 12:17:08.363000   4.0          NaN
    10 2016-10-31 12:38:47.363000  36.0          NaN
    11 2016-10-31 12:39:37.910508   NaN   845.703125
    

    【讨论】:

    • 谢谢。这实际上似乎有效(尽管我现在必须进行一些其他更改)。但是为什么nn 中的时间索引没有元组呢?为什么edadat 中的元组会改变nn 时间索引?这与类型转换有关吗?最后一个问题,我认为df.time = x 应该替换为df.loc[:,"time"] = time。现在怎么样了?
    • 可以通过print (edadat[edadat.time.str.len() > 1])查看
    猜你喜欢
    • 2016-02-26
    • 1970-01-01
    • 2021-10-23
    • 2018-02-09
    • 1970-01-01
    • 1970-01-01
    • 2016-03-19
    • 1970-01-01
    • 2019-11-10
    相关资源
    最近更新 更多