计算其他实体面板数据的百分比变化答案

【问题标题】：Calculating percentage change of panel data for other entities计算其他实体面板数据的百分比变化
【发布时间】：2020-07-01 17:13:57
【问题描述】：

我有一个非常大的数据框，采用面板数据的形式。该数据包含国家内每个行业在一系列年份的生产经济信息。我想找到一个代码来计算同一行业内该产出的逐年百分比变化，但将不同国家/地区的数据汇总为同一行。

这听起来很难（很难解释）所以我举个例子。使用此代码：

panel <- cbind.data.frame(industry =  rep(c("Logging" , "Automobile") , each = 9) ,
               country = rep(c("Austria" , "Belgium" , "Croatia") , each = 3 , times = 2) ,
               year = rep(c(2000:2002) , times = 6) ,
               output = c(2,3,4,1,5,8,1,2,4,2,3,4,6,7,8,9,10,11))

这给出了这个矩阵：

     industry country year output
1     Logging Austria 2000      2
2     Logging Austria 2001      3
3     Logging Austria 2002      4
4     Logging Belgium 2000      1
5     Logging Belgium 2001      5
6     Logging Belgium 2002      8
7     Logging Croatia 2000      1
8     Logging Croatia 2001      2
9     Logging Croatia 2002      4
10 Automobile Austria 2000      2
11 Automobile Austria 2001      3
12 Automobile Austria 2002      4
13 Automobile Belgium 2000      6
14 Automobile Belgium 2001      7
15 Automobile Belgium 2002      8
16 Automobile Croatia 2000      9
17 Automobile Croatia 2001     10
18 Automobile Croatia 2002     11

我使用 tidyverse 计算每个行业的百分比变化：

library(tidyverse)

panel <- panel %>%
  group_by(country , industry) %>%
  mutate(per_change = (output - lag(output)) / lag(output))

给予：

# A tibble: 18 x 5
# Groups:   country, industry [6]
   industry   country  year output per_change
   <fct>      <fct>   <int>  <dbl>      <dbl>
 1 Logging    Austria  2000      2     NA    
 2 Logging    Austria  2001      3      0.5  
 3 Logging    Austria  2002      4      0.333
 4 Logging    Belgium  2000      1     NA    
 5 Logging    Belgium  2001      5      4    
 6 Logging    Belgium  2002      8      0.6  
 7 Logging    Croatia  2000      1     NA    
 8 Logging    Croatia  2001      2      1    
 9 Logging    Croatia  2002      4      1    
10 Automobile Austria  2000      2     NA    
11 Automobile Austria  2001      3      0.5  
12 Automobile Austria  2002      4      0.333
13 Automobile Belgium  2000      6     NA    
14 Automobile Belgium  2001      7      0.167
15 Automobile Belgium  2002      8      0.143
16 Automobile Croatia  2000      9     NA    
17 Automobile Croatia  2001     10      0.111
18 Automobile Croatia  2002     11      0.1

所以我想要一个代码，为第 1 行 NA，第 2 行提供 2001 年除奥地利 (4+1) = 5 之外的所有伐木业的百分比变化总和，第 3 行在伐木业中所有百分比变化的总和2002 年除奥地利 (0.6 +1) = 1.6，第 4 行再次 NA，第 5 行 2001 年除比利时 (1.5) 之外的登录百分比变化总和，....

我不知道如何手动做到这一点。

还请提供一个灵活且能够识别 N 个国家和 Y 个行业的代码。

【问题讨论】：

标签： r

【解决方案1】：

你可以

首先根据行业和年份对“面板”表进行分组以汇总“per_change”
第二次将此分组表与您的主表连接
最后从“分组总和”中减去“per_change”

在你的代码之后：

d1<-as.data.frame(panel)

attach(panel)

d2<-aggregate(per_change~industry+year, FUN=sum)

detach(panel)

library(dplyr)
panel<-left_join(d1,d2, by=c("industry"="industry", "year"="year"))

panel$exc_per_change<-panel$per_change.y-panel$per_change.x

输出是

> head(panel)
  industry country year output per_change.x per_change.y exc_per_change
1  Logging Austria 2000      2           NA           NA             NA
2  Logging Austria 2001      3    0.5000000     5.500000       5.000000
3  Logging Austria 2002      4    0.3333333     1.933333       1.600000
4  Logging Belgium 2000      1           NA           NA             NA
5  Logging Belgium 2001      5    4.0000000     5.500000       1.500000
6  Logging Belgium 2002      8    0.6000000     1.933333       1.333333

【讨论】：