【问题标题】:Logical comparison with date column not working in Pandas dataframe与日期列的逻辑比较在 Pandas 数据框中不起作用
【发布时间】:2020-01-16 06:24:42
【问题描述】:

我有一个这样的数据框:

Milestone   Initial_Date    Next_Date   Buffer  Buffer1
-------------------------------------------------------
M0          11/1/2020       13/1/2020   6       1
M1          13/1/2020       15/1/2020   3       1
M0          24/12/2019      25/12/2019  4       2
M1          16/12/2019      21/12/2019  9       2
M0          8/1/2020        14/1/2020   10      1
M2          6/1/2020        9/1/2020    5       2
M3         18/1/2020       21/1/2020    3       4

我将以下逻辑应用于数据框

CASE
   WHEN milestone = 'M0' THEN Intial_date + Buffer
   WHEN milestone = 'M1' THEN Next_datee + Buffer
   WHEN milestone >= 'M2' THEN Intial_date + Buffer1
   ELSE NULL
END AS Result

预期输出:

Result
------------
17/1/2020
18/1/2020
28/12/2019
30/12/2019
18/1/2020
8/1/2020
22/1/2020

我的代码:

#data 字段的数据类型为 datetime64[ns],Buffer 为 float64

   data['Milestone'] = pd.Categorical(data['Milestone'],categories=['00','M0','M1','M2','M3','M4','M5','M6','M7'],ordered=True)
    buffer = pd.to_timedelta(final_result['Buffer'], unit='d')
    buffer1 = pd.to_timedelta(final_result['Buffer1'], unit='d')
    data['Result'] =np.select([data['Milestone']=='M0',data['Milestone']=='M1',
                                            data[MILESTONE']>='M2']
                                ,[data['Initial_Date']+Buffer,data['Next_Date']+Buffer,
                                  data['Initial_Date']+Buffer1)

我收到一个错误

TypeError: 无效的类型提升

从上面的代码。你能帮我解决这个问题吗?

【问题讨论】:

    标签: python-3.x pandas dataframe datetime


    【解决方案1】:

    首先需要将默认参数添加到NoneNaT,然后将输出转换为日期时间:

    data['Result'] =pd.to_datetime(np.select([data['Milestone']=='M0',
                                   data['Milestone']=='M1',
                                   data['Milestone']>='M2'],
                                  [data['Initial_Date']+buffer,
                                   data['Next_Date']+buffer,
                                   data['Initial_Date']+buffer1],
                                   default=None))
    
    print (data)
      Milestone Initial_Date  Next_Date  Buffer  Buffer1     Result
    0        M0   2020-01-11 2020-01-13       6        1 2020-01-17
    1        M1   2020-01-13 2020-01-15       3        1 2020-01-18
    2        M0   2019-12-24 2019-12-25       4        2 2019-12-28
    3        M1   2019-12-16 2019-12-21       9        2 2019-12-30
    4        M0   2020-01-08 2020-01-14      10        1 2020-01-18
    5        M2   2020-01-06 2020-01-09       5        2 2020-01-08
    6        M3   2020-01-18 2020-01-21       3        4 2020-01-22
    

    【讨论】:

    • `default=pd.Timestamp('NaT')` 是做什么的
    • @Rahulrajan - datetime 缺少值。
    • @Rahulrajan 它是 Else Null,但使用 datetime dtype,因为所有其他值都是日期
    猜你喜欢
    • 1970-01-01
    • 2021-09-13
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2023-03-25
    • 2020-02-10
    • 2022-01-01
    相关资源
    最近更新 更多