【发布时间】:2021-02-05 07:59:04
【问题描述】:
我有一个数据框,其中包含过去五年的每日数据。除了值列,数据框还包含日期字段和监管年份列。我想创建两列:监管周数和监管月数。监管年度从 4 月 1 日开始,到 3 月 31 日结束。所以我用下面的代码来生成监管周数和月数:
df['Week'] = np.where(df['date'].dt.isocalendar().week > 13, df['date'].dt.isocalendar().week-13,df['date'].dt.isocalendar().week + 39)
df['month'] =df['date'].dt.month
months = ['Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec','Jan','Feb','Mar']
df['month'] = pd.CategoricalIndex(df['month'], ordered=True, categories=months)
df['month number'] = df['month'].apply(lambda x: months.index(x)+1)
创建上述两列后,我的数据框如下所示:
RY month Week Value 1 Value 2 Value 3 Value 4 month number
2016 Apr 1 0.00000 0.00000 0.000000 0.00000 1
2016 Apr 2 1.31394 0.02961 1.313940 0.02961 1
2016 Apr 3 4.98354 0.07146 4.983540 0.07146 1
2016 Apr 4 4.30606 0.05742 4.306060 0.05742 1
2016 Apr 5 1.94634 0.01958 1.946340 0.01958 1
2016 May 5 0.25342 0.01625 0.253420 0.01625 2
2016 May 6 0.64051 0.00777 0.640510 0.00777 2
2016 May 7 1.26451 0.02994 1.264510 0.02994 2
2016 May 8 2.71035 0.08150 2.194947 0.08150 2
2016 May 9 11.95120 0.13386 1.624328 0.13386 2
2016 Jun 10 6.93051 0.08126 6.930510 0.08126 3
2016 Jun 11 1.18872 0.03953 1.188720 0.03953 3
2016 Jun 12 3.19961 0.05760 0.924562 0.05760 3
2016 Jun 13 3.90429 0.04985 0.956445 0.04985 3
2016 Jun 14 0.84002 0.01738 0.840020 0.01738 3
2016 Jul 14 0.07358 0.00562 0.073580 0.00562 4
2016 Jul 15 0.78253 0.03014 0.782530 0.03014 4
2016 Jul 16 1.23036 0.01816 1.230360 0.01816 4
2016 Jul 17 0.62948 0.01341 0.629480 0.01341 4
2016 Jul 18 0.45513 0.00552 0.455130 0.00552 4
现在我想创建一个数据框,其中包含基于周的平均值列。所以我使用以下命令来计算平均值:
mean_df = df.groupby('Week')['Value1','Value2','Value3','Value4'].mean().reset_index()
新的数据框如下所示:
Week Value 1 Value 2 Value 3 Value 4
1 3.013490 0.039740 1.348016 0.039740
2 3.094456 0.045142 3.094456 0.045142
3 1.615948 0.027216 1.615948 0.027216
4 2.889245 0.043998 1.903319 0.043998
5 0.431549 0.009679 0.431549 0.009679
6 1.045670 0.017302 1.045670 0.017302
7 2.444196 0.034304 2.444196 0.034304
8 1.041210 0.026464 0.938129 0.026464
9 2.068607 0.030550 0.921176 0.030550
10 2.400118 0.051476 2.400118 0.051476
11 1.738332 0.035362 1.738332 0.035362
12 1.369790 0.038576 0.914780 0.038576
13 1.921781 0.021218 0.749460 0.021218
14 1.471432 0.027367 1.471432 0.027367
15 2.722526 0.053794 1.676559 0.053794
16 3.132406 0.043520 1.195321 0.043520
17 0.733952 0.021142 0.733952 0.021142
18 0.645236 0.014454 0.645236 0.014454
19 2.466326 0.049704 0.879481 0.049704
20 2.111326 0.013262 0.682253 0.013262
21 1.301004 0.023048 1.301004 0.023048
22 0.705360 0.023439 0.705360 0.023439
23 1.323438 0.019103 1.323438 0.019103
24 0.569906 0.012540 0.569906 0.012540
25 7.898792 0.034246 1.382349 0.034246
26 0.896413 0.013013 0.896413 0.013013
27 4.478349 0.039749 1.703887 0.039749
28 5.807160 0.052526 2.036502 0.052526
29 3.308176 0.043984 2.117939 0.043984
30 1.991078 0.046058 1.991078 0.046058
31 0.806589 0.016945 0.806589 0.016945
32 2.091860 0.029234 2.091860 0.029234
33 1.149280 0.025194 1.149280 0.025194
34 4.746376 0.067742 2.863484 0.067742
35 5.128558 0.029608 1.537541 0.029608
36 2.765563 0.052125 2.765563 0.052125
37 2.314376 0.036046 2.314376 0.036046
38 2.552290 0.030626 1.483397 0.030626
39 1.456778 0.037448 1.456778 0.037448
40 1.212090 0.024698 1.212090 0.024698
41 4.729104 0.037646 1.296358 0.037646
42 3.412830 0.053132 3.412830 0.053132
43 8.916526 0.050044 1.839411 0.050044
44 2.450281 0.029806 0.942205 0.029806
45 2.156186 0.024064 2.156186 0.024064
46 2.336330 0.042538 2.336330 0.042538
47 1.798326 0.025270 1.798326 0.025270
48 1.352004 0.018382 1.352004 0.018382
49 10.220510 0.073480 1.607830 0.073480
50 2.575344 0.047760 2.575344 0.047760
51 1.226056 0.028676 1.226056 0.028676
52 0.470392 0.009991 0.466561 0.009991
现在我想将上述数据框中的月份和月份名称插入到新数据框中。我想根据“周”将两个数据框合并在一起,但我发现相同的周数分配给了两个不同的月份(在第一个数据框中)。例如,第 5 周被分配到 4 月和 5 月。
理想情况下,一周编号仅分配给一个月。我不确定我是否以正确的方式计算周数。有没有人遇到过同样的问题?关于如何计算周数的任何建议,以便周数不会与两个月重叠。
【问题讨论】:
标签: python python-3.x pandas