【问题标题】:Pandas select time series with at least five positive values in a rowPandas 选择连续至少有五个正值的时间序列
【发布时间】:2023-02-23 01:53:23
【问题描述】:

这是一个用于时间序列预测的数据集。一些时间序列列有很多零,我想忽略它们。

import pandas as pd
df = pd.DataFrame({'date': ['2019-01-06 00:00:00','2019-01-13 00:00:00','2019-01-27 00:00:00',
                            '2019-02-03 00:00:00','2019-02-10 00:00:00','2019-02-17 00:00:00',
                            '2019-02-25 00:00:00','2019-03-02 00:00:00','2019-03-09 00:00:00',
                            '2019-03-16 00:00:00'],
                    'timeseries1': [None, None, None, 5, 10, 5, 10, 5, 8, 15], 
                    'timeseries2': [4, 4, None, 4, None, None, 5, 9, 6, 12], 
                    'timeseries3': [None, 5, 9, 6, 12, 10, None, None, None, None],
                    'timeseries4': [None, None, 9, None, 10, 5, 8, None, 7, None],
                    'timeseries5': [None, 5, 5, 10, 5, 8, 15, 9, None, None]
                            })
df = df.set_index('date')
df

我想选择连续包含至少五个正值的列。因此,结果将是三个独立的时间序列值,如下所示。

timeseries1 = pd.DataFrame({'date': ['2019-02-03 00:00:00','2019-02-10 00:00:00','2019-02-17 00:00:00',
                            '2019-02-25 00:00:00','2019-03-02 00:00:00','2019-03-09 00:00:00',
                            '2019-03-16 00:00:00'],
                    'timeseries1': [5, 10, 5, 10, 5, 8, 15]                    
                            })
timeseries1 = timeseries1.set_index('date')
timeseries1


timeseries3 = pd.DataFrame({'date': ['2019-01-13 00:00:00','2019-01-27 00:00:00',
                            '2019-02-03 00:00:00','2019-02-10 00:00:00','2019-02-17 00:00:00',
                            ],
                    'timeseries3': [5, 9, 6, 12, 10]                  
                            })
timeseries3  = timeseries3.set_index('date')
timeseries3 



timeseries5 = pd.DataFrame({'date': ['2019-01-13 00:00:00','2019-01-27 00:00:00',
                            '2019-02-03 00:00:00','2019-02-10 00:00:00','2019-02-17 00:00:00',
                            '2019-02-25 00:00:00','2019-03-02 00:00:00'],                    
                    'timeseries5': [5, 5, 10, 5, 8, 15, 9]
                            })
timeseries5 = timeseries5.set_index('date')
timeseries5

【问题讨论】:

    标签: pandas


    【解决方案1】:

    我个人建议将日期拆分为datetime 列,但保留timestamp

    首先,确保时间戳列是日期类型。

    df['timestamp'] == df['date']
    df['date'] = pd.to_datetime(df['timestamp']).dt.date
    df['time'] = pd.to_datetime(df['timestamp']).dt.time
    

    优点是您可以轻松过滤数据。

    另一种方法是熊猫方法between_time(见pandas documentation

    df.between_time('0:00:01', '23:59:59')
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2019-05-16
      • 2022-01-16
      • 1970-01-01
      • 2020-04-05
      • 2018-10-13
      • 2017-04-11
      • 2021-05-22
      • 1970-01-01
      相关资源
      最近更新 更多