【发布时间】:2021-11-11 04:13:42
【问题描述】:
我有一个按CLIENT_ID和ENCOUNTER_DATE排序的数据框,如下图:
| CLIENT_ID | ENCOUNTER_DATE | STAGE |
|---|---|---|
| 8222 | 2020-01-01 | 1 |
| 8222 | 2020-03-02 | 1 |
| 8222 | 2020-04-18 | 2 |
| 8222 | 2020-07-31 | 1 |
| 8300 | 2017-06-10 | 1 |
| 8300 | 2017-09-11 | 2 |
| 8300 | 2018-02-01 | 2 |
| 8300 | 2018-04-01 | 3 |
| 8300 | 2018-05-31 | 4 |
| 8400 | 2020-12-31 | 1 |
| 8401 | 2017-08-29 | 1 |
| 8401 | 2017-09-15 | 3 |
| 8500 | 2018-10-10 | 2 |
如何创建一个新列(标志列),它指示每个 CLIENT_ID 的前一个 DATE_ENCOUNTER 中的 STAGE 是否大于当前 DATE_ENCOUNTER,这将导致下表:
| CLIENT_ID | ENCOUNTER_DATE | STAGE | STAGE_WORSENED |
|---|---|---|---|
| 8222 | 2020-01-01 | 1 | 0 |
| 8222 | 2020-03-02 | 1 | 0 |
| 8222 | 2020-04-18 | 2 | 1 |
| 8222 | 2020-07-31 | 1 | 0 |
| 8300 | 2017-06-10 | 1 | 0 |
| 8300 | 2017-09-11 | 2 | 1 |
| 8300 | 2018-02-01 | 2 | 0 |
| 8300 | 2018-04-01 | 3 | 1 |
| 8300 | 2018-05-31 | 4 | 1 |
| 8400 | 2020-12-31 | 1 | 0 |
| 8401 | 2017-08-29 | 1 | 0 |
| 8401 | 2017-09-15 | 3 | 1 |
| 8500 | 2018-10-10 | 2 | 0 |
这是生成df的代码:
df = pd.DataFrame({"CLIENT_ID": [8222, 8222, 8222, 8222, 8300, 8300, 8300, 8300, 8300, 8400, 8401, 8401, 8500],
"ENCOUNTER_DATE": ['2020-01-01', '2020-03-02', '2020-04-18', '2020-07-31', '2017-06-10', '2017-09-11', '2018-02-01', '2018-04-01', '2018-05-31', '2020-12-31', '2017-08-29', '2017-09-15', '2018-10-10'],
"STAGE": [1, 1, 2, 1, 1, 2, 2, 3, 4, 1, 1, 3, 2]})
【问题讨论】:
标签: python pandas dataframe numpy pandas-groupby