【问题标题】:Date Timestamps and datetime64[ns, UTC] comparison in python pandaspython pandas中的日期时间戳和datetime64 [ns,UTC]比较
【发布时间】:2020-08-20 04:14:59
【问题描述】:

由于使用 pandas DataFrames 的过滤过程的影响,我很困惑。我正在尝试获取介于某些日期之间的行,但我的结果 DataFrame 为空。我确信那个时期的数据是存在的。

df.info() 返回“opentime”是类型:`opendate 440383 non-null datetime64[ns, UTC]

代码sn-p:

from datetime import timedelta
from datetime import datetime

current_date = pd.datetime.now()
t_delta_week = timedelta(days=7)
t_delta_year = timedelta(days=365)

#CurrentDate
date_start2020 = pd.Timestamp(current_date - t_delta_week, unit='ms')
date_end2020 = pd.Timestamp(current_date, unit='ms')

date_start2020 = date_start2020.tz_localize('utc')
date_end2020 = date_end2020.tz_localize('utc')


#LastYearDate
date_start2019 = pd.Timestamp(current_date - t_delta_year - t_delta_week, unit='ms')
date_end2019 = pd.Timestamp(current_date - t_delta_year, unit='ms')

date_start2019 = date_start2019.tz_localize('utc')
date_end2019 = date_end2019.tz_localize('utc')


df2020_2019['opendate'] = pd.to_datetime(df2020_2019['opendate'], unit='ms') 
mask = (df2020_2019['opendate'] > date_start2020) & (df2020_2019['opendate'] <= date_end2020)
df_currYear = df2020_2019.loc[mask]

df_currYear

返回的DataFrame为空

感谢您的帮助! :)

编辑:

也许这会有所帮助:“opendate”是生成列并使用此代码片段创建的:

import pandas as pd
fmt = '%Y-%m-%dT%H:%M:%S'
df2020_2019.dropna(subset=['opentime_TS'], inplace=True)
df2020_2019['opendate'] = pd.to_datetime(df2020_2019['opentime_TS'], utc=True, format=fmt, errors='ignore')

此外,我还放了一些 head() 的打印件 data sample。由于隐私,我无法提供 df 的记录 :)

【问题讨论】:

  • 我无法使用pd.date_range 模拟“opendate”,将带有虚拟 df 的空 df_currYear 复制为 df2020_2019。起初我认为这是一个 tz 问题,但是,这应该会引发 TypeError 并且看起来你确保一切都在 UTC...
  • 我要添加一些细节来提问,也许这会有所帮助。说实话,我之前遇到过 TypeError,比如“无法比较 tz-aware 和 tz-dummy dates”。我还成功地在函数之间进行了测试:df_lastYear = df2020_2019[df2020_2019["opendate"].between(date_start2019, date_end2019)],但在 2020 年仍然不能这样做:)

标签: python pandas date datetime timestamp


【解决方案1】:

好吧,我的错。由于专注于 tz TypeErrors,我刚刚变得盲目...我采用了已经过时的错误数据源:) 使用正确数据的最终解决方案:

from datetime import timedelta
from datetime import datetime
import pandas as pd

fmt = '%Y-%m-%dT%H:%M:%S'
df2020_2019.dropna(subset=['opentime_TS'], inplace=True)
df2020_2019['opendate'] = pd.to_datetime(df2020_2019['opentime_TS'], utc=True, format=fmt, errors='ignore')
df2020_2019.info()

current_date = pd.Timestamp.now()
t_delta_week = timedelta(days=7)
t_delta_year = timedelta(days=365)

#CurrentDate
date_start2020 = pd.Timestamp(current_date - t_delta_week, unit='ms')
date_end2020 = pd.Timestamp(current_date, unit='ms')

date_start2020 = date_start2020.tz_localize('utc')
date_end2020 = date_end2020.tz_localize('utc')


#LastYearDate
date_start2019 = pd.Timestamp(current_date - t_delta_year - t_delta_week, unit='ms')
date_end2019 = pd.Timestamp(current_date - t_delta_year, unit='ms')

date_start2019 = date_start2019.tz_localize('utc')
date_end2019 = date_end2019.tz_localize('utc')

df2020_2019['opendate'] = pd.to_datetime(df2020_2019['opendate'], unit='ms') 

df_currYear = df2020_2019[df2020_2019["opendate"] > date_start2020]
df_lastYear = df2020_2019[df2020_2019["opendate"].between(date_start2019, date_end2019)]

df_currYear

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2023-02-07
    • 1970-01-01
    • 2019-01-20
    • 2020-09-04
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2023-01-09
    相关资源
    最近更新 更多