【问题标题】:Pandas: How to filter out timestamps that are listedPandas:如何过滤列出的时间戳
【发布时间】:2021-05-17 20:10:47
【问题描述】:

我有一个如下所示的数据框: 它捕获特定事件发生的特定时间(以分钟为单位)。

+---------------------+
| time                |
+---------------------+
| 2021-01-01 08:01:00 |
+---------------------+
| 2021-01-01 09:32:00 |
+---------------------+
| 2021-01-02 12:01:00 |
+---------------------+
| 2021-01-02 16:30:00 |
+---------------------+
| ...                 |
+---------------------+
| ...                 |
+---------------------+
| 2021-01-31 06:01:00 |
+---------------------+

我想创建一个新的单独数据框,其中包含从 2021-01-01 00:00:00 到 2021-01-31 11:59:00 的分钟级时间戳,其中时间 未在上面的数据框

它看起来像这样:

+---------------------+
| time                |
+---------------------+
| 2021-01-01 00:00:00 |
+---------------------+
| 2021-01-01 00:01:00 |
+---------------------+
| ...                 |
+---------------------+
| 2021-01-01 08:00:00 |
+---------------------+
| 2021-01-01 08:02:00 |
+---------------------+
| ...                 |
+---------------------+
| 2021-01-01 09:31:00 |
+---------------------+
| 2021-01-01 09:33:00 |
+---------------------+
| ...                 |
+---------------------+
| 2021-01-02 12:00:00 |
+---------------------+
| 2021-01-02 12:02:00 |
+---------------------+
| ...                 |
+---------------------+
| 2021-01-02 16:29:00 |
+---------------------+
| 2021-01-02 16:31:00 |
+---------------------+
| ...                 |
+---------------------+
| 2021-01-02 06:00:00 |
+---------------------+
| 2021-01-02 06:02:00 |
+---------------------+

有什么优雅的方法可以做到这一点?

非常感谢您的帮助!

【问题讨论】:

    标签: python pandas date datetime


    【解决方案1】:

    生成随机输入数据:

    >>> df
                       time
    0   2021-01-01 01:08:00
    1   2021-01-01 01:23:00
    2   2021-01-01 01:35:00
    3   2021-01-01 02:13:00
    4   2021-01-01 03:47:00
    ..                  ...
    995 2021-01-31 08:24:00
    996 2021-01-31 09:30:00
    997 2021-01-31 10:24:00
    998 2021-01-31 10:31:00
    999 2021-01-31 10:34:00
    
    [1000 rows x 1 columns]  # <- 1000
    

    使用freq=T 创建一个DatetimeIndex(1 分钟)

    start_date = "2021-01-01 00:00:00"
    end_date = "2021-01-31 11:59:00"
    dti = pd.date_range(start_date, end_date, freq="T")
    
    >>> dti
    DatetimeIndex(['2021-01-01 00:00:00', '2021-01-01 00:01:00',
                   '2021-01-01 00:02:00', '2021-01-01 00:03:00',
                   '2021-01-01 00:04:00', '2021-01-01 00:05:00',
                   '2021-01-01 00:06:00', '2021-01-01 00:07:00',
                   '2021-01-01 00:08:00', '2021-01-01 00:09:00',
                   ...
                   '2021-01-31 11:50:00', '2021-01-31 11:51:00',
                   '2021-01-31 11:52:00', '2021-01-31 11:53:00',
                   '2021-01-31 11:54:00', '2021-01-31 11:55:00',
                   '2021-01-31 11:56:00', '2021-01-31 11:57:00',
                   '2021-01-31 11:58:00', '2021-01-31 11:59:00'],
                  dtype='datetime64[ns]', length=43920, freq='T')  # <- 43920 
    

    计算两组DatetimeIndex之间的差异:

    >>> dti.difference(df["time"])
    DatetimeIndex(['2021-01-01 00:00:00', '2021-01-01 00:01:00',
                   '2021-01-01 00:02:00', '2021-01-01 00:03:00',
                   '2021-01-01 00:04:00', '2021-01-01 00:05:00',
                   '2021-01-01 00:06:00', '2021-01-01 00:07:00',
                   '2021-01-01 00:08:00', '2021-01-01 00:09:00',
                   ...
                   '2021-01-31 11:50:00', '2021-01-31 11:51:00',
                   '2021-01-31 11:52:00', '2021-01-31 11:53:00',
                   '2021-01-31 11:54:00', '2021-01-31 11:55:00',
                   '2021-01-31 11:56:00', '2021-01-31 11:57:00',
                   '2021-01-31 11:58:00', '2021-01-31 11:59:00'],
                  dtype='datetime64[ns]', length=42920, freq=None)  # <- 43920 - 1000
    

    【讨论】:

    • 虽然我认为,用 1 分钟指定频率更容易阅读我不知道 dti.difference(df["time"]) 反过来比 .isin() 解决方案更容易阅读 => +1。
    【解决方案2】:

    你可以试试这个:

    blacklist = df_yours['time'].unique().tolist()
    df = pd.DataFrame({'mins':pd.date_range(start='1/1/2021', end='1/1/2022', freq='1min')}).iloc[:-1]
    df = df[~df['mins'].isin(blacklist)]
    

    【讨论】:

      猜你喜欢
      • 2020-06-02
      • 2018-07-17
      • 1970-01-01
      • 2021-01-17
      • 2020-12-06
      • 1970-01-01
      • 2020-07-13
      • 2021-10-14
      • 2018-03-22
      相关资源
      最近更新 更多