【问题标题】:R: Dataframe ManipulationR:数据框操作
【发布时间】:2021-06-28 00:35:51
【问题描述】:
我有如下所示的跟随数据框
| ID |
COUNT OF STOCK |
YEAR |
| A1 |
10 |
2000 |
| A1 |
20 |
2000 |
| A1 |
18 |
2000 |
| A1 |
15 |
2001 |
| A1 |
30 |
2001 |
| A2 |
35 |
2002 |
| A2 |
50 |
2001 |
| A2 |
10 |
2002 |
| A2 |
22 |
2002 |
| A3 |
11 |
2001 |
| A3 |
15 |
2001 |
| A3 |
28 |
2000 |
我想通过分组 ID 和年份(然后用于计算从 2020 年开始的年数)将数据框更改为下面显示的数据框,以找到库存计数的总和
| ID |
Sum of COUNT OF STOCK |
number of years from 2020 (2020-year) |
| A1 |
48 |
20 |
| A1 |
45 |
19 |
| A2 |
67 |
18 |
| A2 |
50 |
19 |
| A3 |
26 |
19 |
| A3 |
28 |
20 |
提前致谢!!
【问题讨论】:
标签:
r
dataframe
group-by
aggregate
data-manipulation
【解决方案1】:
这很简单。但是,要使用这些冗长的列名,您必须引用它们,这可能是一个挑战。
dat %>% group_by( ID, YEAR ) %>%
summarise(
`Sum of COUNT OF STOCK` = sum( `COUNT OF STOCK` ),
`number of years from 2020 (2020-year)` = 2020 - first(YEAR)
) %>% select( -YEAR )
输出:
ID `Sum of COUNT OF STOCK` `number of years from 2020 (2020-year)`
<chr> <int> <dbl>
1 A1 48 20
2 A1 45 19
3 A2 50 19
4 A2 67 18
5 A3 28 20
6 A3 26 19
【解决方案2】:
只需这样做。
df %>% group_by(D, number_of_years = 2020 - YEAR) %>%
summarise(Sum_of_stock = sum(COUNT_OF_STOCK))
# A tibble: 6 x 3
# Groups: D [3]
D number_of_years Sum_of_stock
<chr> <dbl> <int>
1 A1 19 45
2 A1 20 48
3 A2 18 67
4 A2 19 50
5 A3 19 26
6 A3 20 28
数据
df <- read.table(text = "D COUNT_OF_STOCK YEAR
A1 10 2000
A1 20 2000
A1 18 2000
A1 15 2001
A1 30 2001
A2 35 2002
A2 50 2001
A2 10 2002
A2 22 2002
A3 11 2001
A3 15 2001
A3 28 2000", header = T)