【问题标题】:Add last week of every year to the next week将每年的最后一周加到下一周
【发布时间】:2021-08-04 19:52:09
【问题描述】:

如何删除每年的最后一周并将其添加到所有数字列的通用代码中的下一周?

df
  date      value
2019-12-20   0
2019-12-27   3
2020-01-03   7
...
2020-12-18   0
2020-12-25   4 
2021-01-01   7

预期输出

  date      value
2019-12-20   0
2020-01-03   10
...
2020-12-18   0
2021-01-01   11

【问题讨论】:

  • 为了透明度,您可以添加一个您计算的“财政年度”列

标签: python pandas date group-by


【解决方案1】:

根据您的问题,我假设您的 DataFrame 每周只包含一行(看起来您这里只有星期五)。我还假设没有错过一周(即没有跳过星期五)并且它们是按时间排序的(如果没有,请先致电 df = df.sort_values("date") 就可以了)。

以下 sn-p 应该可以解决您的问题(解释在代码中):

import pandas as pd

df = pd.DataFrame({
    "date": [
        "2019-12-20", "2019-12-27",
        "2020-01-03", "2020-12-18",
        "2020-12-25", "2021-01-01"
    ],
    "value": [0, 3, 7, 0, 4, 7],
})

numeric_columns = ["value"]

# Compute whether a row is the last week of a year
year = df["date"].str[:4]
is_last_week = year != year.shift(-1).fillna(year.iloc[-1])
print(is_last_week)

0    False
1     True
2    False
3    False
4     True
5    False
Name: date, dtype: bool

# Take the value from those rows
values_on_last_week = df[numeric_columns].where(is_last_week)
print(values_on_last_week)

   value
0    NaN
1    3.0
2    NaN
3    NaN
4    4.0
5    NaN

# Shift values one row down
shifted_values_on_last_week = values_on_last_week.shift()
print(shifted_values_on_last_week)

   value
0    NaN
1    NaN
2    3.0
3    NaN
4    NaN
5    4.0

# Put zeroes instead of NaNs
shifted_values_on_last_week = shifted_values_on_last_week.fillna(0)
print(shifted_values_on_last_week)

   value
0    0.0
1    0.0
2    3.0
3    0.0
4    0.0
5    4.0

# Add this to df
df[numeric_columns] = df[numeric_columns] + shifted_values_on_last_week
print(df)

         date  value
0  2019-12-20    0.0
1  2019-12-27    3.0
2  2020-01-03   10.0
3  2020-12-18    0.0
4  2020-12-25    4.0
5  2021-01-01   11.0

# Drop the rows we don't want anymore
df = df[~is_last_week]
print(df)

         date  value
0  2019-12-20    0.0
2  2020-01-03   10.0
3  2020-12-18    0.0
5  2021-01-01   11.0

【讨论】:

    【解决方案2】:

    另一种方法是查看数据集中每年的最小和最大日期。

    data = '''  date      value
    2019-12-20   0
    2019-12-27   3
    2020-01-03   7
    2020-12-18   0
    2020-12-25   4
    2021-01-01   7'''
    
    df = pd.read_csv(io.StringIO(data), sep='\s+', engine='python')
    df['date'] = pd.to_datetime(df['date'])
    
    #get the max dates rows and min dates rows
    dfmax = df[df['date'].dt.month==12].groupby(df['date'].dt.year).max().reset_index(drop=True)
    dfmin = df[df['date'].dt.month==1].groupby(df['date'].dt.year).min().reset_index(drop=True)
    
    
    # add the values
    dfh = dfmin
    dfh['value'] = dfmax['value'] +  dfmin['value']
    
    # remove unwanted rows from initial df
    dfidx = dfh['date'].tolist()
    df = df[~df['date'].isin(dfidx)].copy()
    dfidx = dfmax['date'].tolist()
    df = df[~df['date'].isin(dfidx)].copy()
    
    # piece it back together with recalculated dates
    dfnew = pd.concat([dfmin, df]).sort_values('date')
    dfnew
    

    输出

            date  value
    0 2019-12-20      0
    0 2020-01-03     10
    3 2020-12-18      0
    1 2021-01-01     11
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2023-03-29
      • 2021-08-26
      • 2018-06-02
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多