【问题标题】:Pandas DatetimeIndex NonExistentTimeError only when creating MultiIndexPandas DatetimeIndex NonExistentTimeError 仅在创建 MultiIndex 时
【发布时间】:2016-11-24 22:37:05
【问题描述】:

我有一个从 MongoDB 读取的数据list。可以在this gist 中找到数据的子集。我正在从此列表中创建一个 DataFrame,使用日期字段创建一个DatetimeIndex。这些日期最初是用我当地的时区记录的,但在 Mongo 中它们没有附加时区信息,所以我按照 here 的建议更正了 DST。

from datetime import datetime
from dateutil import tz

# data is the list from the gist
dates = [x['Date'] for x in data]
idx =  pd.DatetimeIndex(dates, freq='D')
idx = idx.tz_localize(tz=tz.tzutc())
idx = idx.tz_convert(tz='Europe/Dublin')
idx = idx.normalize()
frame = DataFrame(data, index=idx)
frame = frame.drop('Date', 1)

一切似乎都很好,我的框架看起来像这样

                           Events         ID
2008-03-31 00:00:00+01:00     0.0  116927302
2008-03-30 00:00:00+00:00  2401.0  116927302
2008-03-31 00:00:00+01:00     0.0  116927307
2008-03-30 00:00:00+00:00     0.0  116927307
2008-03-31 00:00:00+01:00     0.0  121126919
2008-03-30 00:00:00+00:00  1019.0  121126919
2008-03-30 00:00:00+00:00     0.0  121126922
2008-03-31 00:00:00+01:00     0.0  121126922
2008-03-30 00:00:00+00:00     0.0  121127133
2008-03-31 00:00:00+01:00     0.0  121127133
2008-03-31 00:00:00+01:00     0.0  131677370
2008-03-30 00:00:00+00:00     0.0  131677370
2008-03-30 00:00:00+00:00     0.0  131677416
2008-03-31 00:00:00+01:00     0.0  131677416

现在我想使用原始的 DatetimeIndex 和 ID 列来创建一个MultiIndex,如图所示here。 但是,当我尝试此操作时,我收到最初创建 DatetimeIndex 时未引发的错误

frame.set_index([frame.ID, idx])

NonExistentTimeError: 2008-03-30 01:00:00

如果我只是在没有 MultiIndex 的情况下执行 frame.set_index(idx),它不会引发错误

版本

  • Python 2.7.11
  • 熊猫 0.18.0

【问题讨论】:

    标签: python datetime pandas dataframe


    【解决方案1】:

    您首先需要sort_index,然后将列ID 附加到index

    frame = frame.sort_index()
    frame.set_index('ID', append=True, inplace=True)
    print (frame)
                                         Events
                              ID               
    2008-03-30 00:00:00+00:00 168445814     0.0
                              168445633     0.0
                              168445653     0.0
                              245514429     0.0
                              168445739     0.0
                              168445810     0.0
                              332955940     0.0
                              168445875     0.0
                              168445628     0.0
                              217596128  1779.0
                              177336685     0.0
                              180799848     0.0
                              215797757     0.0
                              180800351  1657.0
                              183192871     0.0
    ...
    ...     
    

    如果需要其他级别的排序,请使用DataFrame.swaplevel

    frame = frame.sort_index()
    frame.set_index('ID', append=True, inplace=True)
    frame = frame.swaplevel(0,1)
    print (frame)
                                         Events
    ID                                         
    168445814 2008-03-30 00:00:00+00:00     0.0
    168445633 2008-03-30 00:00:00+00:00     0.0
    168445653 2008-03-30 00:00:00+00:00     0.0
    245514429 2008-03-30 00:00:00+00:00     0.0
    168445739 2008-03-30 00:00:00+00:00     0.0
    168445810 2008-03-30 00:00:00+00:00     0.0
    332955940 2008-03-30 00:00:00+00:00     0.0
    168445875 2008-03-30 00:00:00+00:00     0.0
    168445628 2008-03-30 00:00:00+00:00     0.0
    217596128 2008-03-30 00:00:00+00:00  1779.0
    177336685 2008-03-30 00:00:00+00:00     0.0
    180799848 2008-03-30 00:00:00+00:00     0.0
    215797757 2008-03-30 00:00:00+00:00     0.0
    180800351 2008-03-30 00:00:00+00:00  1657.0
    183192871 2008-03-30 00:00:00+00:00     0.0
    186439064 2008-03-30 00:00:00+00:00     0.0
    199856024 2008-03-30 00:00:00+00:00     0.0
    ...
    ...
    

    如果需要将列复制到index,请使用set_index(frame.ID, ...

    frame = frame.sort_index()
    frame.set_index(frame.ID, append=True, inplace=True)
    frame = frame.swaplevel(0,1)
    print (frame)
                                         Events         ID
    ID                                                    
    168445814 2008-03-30 00:00:00+00:00     0.0  168445814
    168445633 2008-03-30 00:00:00+00:00     0.0  168445633
    168445653 2008-03-30 00:00:00+00:00     0.0  168445653
    245514429 2008-03-30 00:00:00+00:00     0.0  245514429
    168445739 2008-03-30 00:00:00+00:00     0.0  168445739
    168445810 2008-03-30 00:00:00+00:00     0.0  168445810
    332955940 2008-03-30 00:00:00+00:00     0.0  332955940
    168445875 2008-03-30 00:00:00+00:00     0.0  168445875
    168445628 2008-03-30 00:00:00+00:00     0.0  168445628
    217596128 2008-03-30 00:00:00+00:00  1779.0  217596128
    177336685 2008-03-30 00:00:00+00:00     0.0  177336685
    180799848 2008-03-30 00:00:00+00:00     0.0  180799848
    215797757 2008-03-30 00:00:00+00:00     0.0  215797757
    180800351 2008-03-30 00:00:00+00:00  1657.0  180800351
    183192871 2008-03-30 00:00:00+00:00     0.0  183192871
    186439064 2008-03-30 00:00:00+00:00     0.0  186439064
    ...
    ...                     
    

    【讨论】:

    • 非常感谢。为什么需要sort_index
    • 嗯,我认为 pandas 中的许多功能都需要它。您可以查看docs - While pandas does not force you to have a sorted date index, some of these methods may have unexpected or incorrect behavior if the dates are unsorted. So please be careful.
    • 这很有趣,感谢您的精彩回答和有见地的评论(更不用说回复的速度了!)
    猜你喜欢
    • 1970-01-01
    • 2019-08-27
    • 1970-01-01
    • 2014-06-26
    • 2020-05-17
    • 2021-12-26
    • 1970-01-01
    • 2023-04-08
    相关资源
    最近更新 更多