【问题标题】:Fill dataframe with consecutive datetimes用连续的日期时间填充数据框
【发布时间】:2022-02-18 12:16:55
【问题描述】:

我有一个数据框:

|         init      |          end        | temp
2022-02-02 10:34:00 | 2022-02-02 11:34:00 | 34
2022-02-02 11:34:00 | 2022-02-02 12:34:00 | 12
2022-02-02 13:34:00 | 2022-02-02 14:34:00 | 23
2022-02-02 14:34:00 | 2022-02-02 15:34:00 | 22
2022-02-02 17:34:00 | 2022-02-02 18:34:00 | 18

我需要从开始和结束日期填写缺失的时间(一个结束是另一个开始),如果我有start=2022-02-02 09:34:00 end=2022-02-02 18:34:00我需要填写DataFrame如下:

|         init      |          end        | temp
**2022-02-02 09:34:00 | 2022-02-02 11:34:00 | 0**
2022-02-02 10:34:00 | 2022-02-02 11:34:00 | 34
2022-02-02 11:34:00 | 2022-02-02 12:34:00 | 12
**2022-02-02 12:34:00 | 2022-02-02 11:34:00 | 0**
2022-02-02 13:34:00 | 2022-02-02 14:34:00 | 23
2022-02-02 14:34:00 | 2022-02-02 15:34:00 | 22
**2022-02-02 15:34:00 | 2022-02-02 11:34:00 | 0**
**2022-02-02 16:34:00 | 2022-02-02 11:34:00 | 0**
2022-02-02 17:34:00 | 2022-02-02 18:34:00 | 18
**2022-02-02 18:34:00 | 2022-02-02 11:34:00 | 0**

【问题讨论】:

  • 这些答案对您有帮助吗?

标签: python pandas datetime


【解决方案1】:

您可以制作包含日期时间段的时间数据帧,然后您可以进行 OUTER JOIN(使用pd.merge()),如下所示:

import pandas as pd
from datetime import timedelta

df = pd.DataFrame({
    'init': ['2022-02-02 10:34:00', '2022-02-02 11:34:00', '2022-02-02 13:34:00', '2022-02-02 14:34:00', '2022-02-02 17:34:00'],
    'end': ['2022-02-02 11:34:00', '2022-02-02 12:34:00', '2022-02-02 14:34:00', '2022-02-02 15:34:00', '2022-02-02 18:34:00'],
    'temp': [34, 12, 23, 22, 18],
})

# to convert str to datetime type for init and end columns
df['init'] = pd.to_datetime(df['init'])
df['end'] = pd.to_datetime(df['end'])

# to create temporal dataframe for additional rows
tmp_df = pd.DataFrame()
tmp_df['init'] = pd.date_range(start=df.iloc[0]['init'] - timedelta(hours=1), end=df.iloc[-1]['end'], freq="H")

# to create final result
result = pd.merge(df, tmp_df, on='init', how='outer')
result = result.sort_values(by=['init']).reset_index(drop=True)
#result['end'] = result['init'] + timedelta(hours=1)  # use this if you make end value as init + 1 hour
result['end'] = result['end'].apply(lambda x: datetime(2020, 2, 2, 11, 34, 0) if x is pd.NaT else x)
result['temp'] = result['temp'].fillna(0) # convert NaN to 0

print(result)

这将打印您所期望的:

>>> result
                 init                 end  temp
0 2022-02-02 09:34:00 2020-02-02 11:34:00   0.0
1 2022-02-02 10:34:00 2022-02-02 11:34:00  34.0
2 2022-02-02 11:34:00 2022-02-02 12:34:00  12.0
3 2022-02-02 12:34:00 2020-02-02 11:34:00   0.0
4 2022-02-02 13:34:00 2022-02-02 14:34:00  23.0
5 2022-02-02 14:34:00 2022-02-02 15:34:00  22.0
6 2022-02-02 15:34:00 2020-02-02 11:34:00   0.0
7 2022-02-02 16:34:00 2020-02-02 11:34:00   0.0
8 2022-02-02 17:34:00 2022-02-02 18:34:00  18.0
9 2022-02-02 18:34:00 2020-02-02 11:34:00   0.0

如果要将“结束”列设为“init + 1 小时”,请使用此代码(已在代码中注释)#result['end'] = result['init'] + timedelta(hours=1),而不是 result['end'] = result['end'].apply(lambda x: datetime(2020, 2, 2, 11, 34, 0) if x is pd.NaT else x)

这将打印以下内容:

                 init                 end  temp
0 2022-02-02 09:34:00 2022-02-02 10:34:00   0.0
1 2022-02-02 10:34:00 2022-02-02 11:34:00  34.0
2 2022-02-02 11:34:00 2022-02-02 12:34:00  12.0
3 2022-02-02 12:34:00 2022-02-02 13:34:00   0.0
4 2022-02-02 13:34:00 2022-02-02 14:34:00  23.0
5 2022-02-02 14:34:00 2022-02-02 15:34:00  22.0
6 2022-02-02 15:34:00 2022-02-02 16:34:00   0.0
7 2022-02-02 16:34:00 2022-02-02 17:34:00   0.0
8 2022-02-02 17:34:00 2022-02-02 18:34:00  18.0
9 2022-02-02 18:34:00 2022-02-02 19:34:00   0.0

【讨论】:

    【解决方案2】:

    您可以使用pd.date_rangepd.Timedelta 的组合:

    import pandas as pd
    
    # Create the sample dataframe
    df = pd.DataFrame({'init': ['2022-02-02 10:34:00', '2022-02-02 11:34:00', '2022-02-02 13:34:00', '2022-02-02 14:34:00', '2022-02-02 17:34:00'], 'end': ['2022-02-02 11:34:00', '2022-02-02 12:34:00', '2022-02-02 14:34:00', '2022-02-02 15:34:00', '2022-02-02 18:34:00'], 'temp': [34, 12, 23, 22, 18]})
    
    # Convert init and end columns into a datetime type
    df['init'] = pd.to_datetime(df['init'])
    df['end'] = pd.to_datetime(df['end'])
    
    # Fill the missing values
    start, end ='2022-02-02 09:34:00', '2022-02-02 18:34:00'
    hr = pd.date_range(start, end, freq='H')
    df_hr = pd.DataFrame(zip(hr, hr + pd.Timedelta(hours=1)), columns=['init', 'end'])
    df = df_hr.merge(df, how='left', on=['init', 'end']).fillna(0)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2023-03-11
      • 2020-09-23
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-12-20
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多