【问题标题】:Formatting data sheet datetime values with pandas and numpy Python使用 pandas 和 numpy Python 格式化数据表日期时间值
【发布时间】:2021-11-07 22:17:31
【问题描述】:

下面的代码计算与日期 month_changes 相关的 vals 值中的 mean','median','max','min'。该代码通过分隔年份间隔来计算 mean','median','max','min'。我想在代码中实现一点,它为从 starting_year 变量开始的年份的 mean','median','max','min' 值添加零,在这个例子中也添加到彼此之间的年份,如 2020 年,它还将添加零。我怎么能做到这一点?

import numpy as np 
import pandas as pd 

month_changes = np.array(["2018-04-01 00:00:00", "2018-05-01 00:00:00", "2019-03-01 00:00:00", "2019-04-01 00:00:00","2019-08-01 00:00:00", "2019-11-01 00:00:00", "2019-12-01 00:00:00","2021-01-01 00:00:00"]) 
vals = np.array([10, 23, 45, 4,5,12,4,-6])
starting_year = 2016

def YearlyIntervals(vals):
    data = pd.DataFrame({"Date": month_changes, "Averages": vals})
    data["Date"] = pd.to_datetime(data["Date"])
    out=(data.groupby(data["Date"].dt.year)
         .agg(['mean','median','max','min'])
         .droplevel(0,1)
         .rename(columns=lambda x:'Average' if x=='mean' else x.title())
        )
    return out

PnL_YearlyFilter= YearlyIntervals(vals)

输出

          Average     Median    Max  Min 
Date                                                                         
2018      16.5        16.5      23   10
2019      14.0        5.0       45    4 
2021      -6.0       -6.0       -6   -6

预期输出

          Average     Median    Max  Min 
Date
2016      0           0         0     0
2017      0           0         0     0
2018      16.5        16.5      23   10
2019      14.0        5.0       45    4
2020      0           0         0     0
2021      -6.0       -6.0       -6   -6

【问题讨论】:

    标签: python arrays pandas numpy datetime


    【解决方案1】:

    使用reindexfill_value=0

    可以将其添加到函数中的一种方式是:

    def yearly_intervals(mc, vs, start_year=None, end_year=None):
        data = pd.DataFrame({
            "Date": pd.to_datetime(mc),  # Convert to_datetime immediately
            "Averages": vs
        })
        out = (
            data.groupby(data["Date"].dt.year)["Averages"]  # Access Series
                .agg(['mean', 'median', 'max', 'min'])
                .rename(columns=lambda x: 'Average' if x == 'mean' else x.title())
        )
        # If start_year
        if start_year is not None:
            # Reindex to ensure index contains all years in range
            out = out.reindex(range(
                start_year,
                # Use last year (maximum value) from index or user defined arg
                (end_year if end_year is not None else out.index.max()) + 1
            ), fill_value=0)
        return out
    

    做了一些改动:

    1. PEP8 指南规定函数名称应“应为小写,必要时用下划线分隔单词”。 Function and Variable Names
    2. 构建DataFrame时可以直接转换numpy数组to_datetime,无需构建DataFrame之后再转换Series。
    3. 使用 SeriesGroupBy.aggregate 而不是 DataFrame groupby 对 groupby 聚合进行轻微修改,以确保跨版本的行为更加一致。
    4. start_yearend_year 带有默认参数的 kwargs 允许函数的更大灵活性,以便可以使用任何年份范围。 (可以添加更多参数处理以确保end_year 始终大于start_year
    5. end_year 是可选的。如果没有提供end_year,它将使用Index.max(索引中的最大值)

    示例函数调用:

    month_changes = np.array(
        ["2018-04-01 00:00:00", "2018-05-01 00:00:00", "2019-03-01 00:00:00",
         "2019-04-01 00:00:00", "2019-08-01 00:00:00", "2019-11-01 00:00:00",
         "2019-12-01 00:00:00", "2021-01-01 00:00:00"])
    vals = np.array([10, 23, 45, 4, 5, 12, 4, -6])
    starting_year = 2016
    
    PnL_YearlyFilter = yearly_intervals(month_changes, vals, starting_year)
    

    PnL_YearlyFilter:

          Average  Median  Max  Min
    Date                           
    2016      0.0     0.0    0    0
    2017      0.0     0.0    0    0
    2018     16.5    16.5   23   10
    2019     14.0     5.0   45    4
    2020      0.0     0.0    0    0
    2021     -6.0    -6.0   -6   -6
    

    【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2020-05-17
    • 2021-06-06
    • 2020-08-03
    • 1970-01-01
    • 1970-01-01
    • 2019-02-15
    • 2021-12-22
    • 2015-05-25
    相关资源
    最近更新 更多