场景分析/长格式数据库操作 - 基于多个标准的数据帧内的差异答案

【问题标题】：Scenario Analysis/Long Format Database Manipulation - Differencing within a data frame based on multiple criteria场景分析/长格式数据库操作 - 基于多个标准的数据帧内的差异
【发布时间】：2014-01-20 08:08:56
【问题描述】：

我正在处理来自场景分析的大量数据输出。我在此分析中生成的最常见的图形类型之一是差异图（无论是堆叠区域还是堆叠列），它显示了基本案例情景和另一种情景在几年内的值变化。为了创建这种类型的情节，我当然需要按年份区分的值。例如，下面的（假）数据显示了 2010 年、2020 年和 2030 年煤炭、天然气和核能发电技术的发电量。

t.data = data.frame(cbind(c(rep('base',9),rep('scen1',9),rep('scen2',9)),
                      c(rep(c(2010,2020,2030),9)),
                      c(rep('coal',3),rep('gas',3),rep('nuclear',3))),
                      c(1000,950,850,500,600,700,400,300,300,1000,800,
                        600,500,650,850,400,400,400,1000,700,400,500,700,800,400,200,100))

colnames(t.data) = c('scen','year','tech','gen')

这些命令应该产生以下数据：

    scen year    tech  gen
1   base 2010    coal 1000
2   base 2020    coal  950
3   base 2030    coal  850
4   base 2010     gas  500
5   base 2020     gas  600
6   base 2030     gas  700
7   base 2010 nuclear  400
8   base 2020 nuclear  300
9   base 2030 nuclear  300
10 scen1 2010    coal 1000
11 scen1 2020    coal  800
12 scen1 2030    coal  600
13 scen1 2010     gas  500
14 scen1 2020     gas  650
15 scen1 2030     gas  850
16 scen1 2010 nuclear  400
17 scen1 2020 nuclear  400
18 scen1 2030 nuclear  400
19 scen2 2010    coal 1000
20 scen2 2020    coal  700
21 scen2 2030    coal  400
22 scen2 2010     gas  500
23 scen2 2020     gas  700
24 scen2 2030     gas  800
25 scen2 2010 nuclear  400
26 scen2 2020 nuclear  200
27 scen2 2030 nuclear  100

我想做的是创建一个新列（例如 gen.diff.base），显示探索性场景（scen1 和 scen2）和基本案例场景（base）之间按技术和年份生成的差异.这样将创建以下内容：

    scen year    tech  gen gen.diff.base
1   base 2010    coal 1000             0
2   base 2020    coal  950             0
3   base 2030    coal  850             0
4   base 2010     gas  500             0
5   base 2020     gas  600             0
6   base 2030     gas  700             0
7   base 2010 nuclear  400             0
8   base 2020 nuclear  300             0
9   base 2030 nuclear  300             0
10 scen1 2010    coal 1000             0
11 scen1 2020    coal  800          -150
12 scen1 2030    coal  600          -250
13 scen1 2010     gas  500             0
14 scen1 2020     gas  650            50
15 scen1 2030     gas  850           150
16 scen1 2010 nuclear  400             0
17 scen1 2020 nuclear  400           100
18 scen1 2030 nuclear  400           100
19 scen2 2010    coal 1000             0
20 scen2 2020    coal  700          -250
21 scen2 2030    coal  400          -450
22 scen2 2010     gas  500             0
23 scen2 2020     gas  700           100
24 scen2 2030     gas  800           100
25 scen2 2010 nuclear  400             0
26 scen2 2020 nuclear  200          -100
27 scen2 2030 nuclear  100          -200

请注意，显示的差异始终是 scen# 减去 base（对应年份和技术）。我的直觉说，有一种简单的方法可以使用 ddply 或 tapply 计算这个新列，但我无法完全解决。任何帮助将非常感激。感谢 R 世界！

顺便说一句，如果有人可以告诉我如何使用 ddply 做到这一点，那将值得额外的奖励！

最好，丹

【问题讨论】：

标签： r plyr

【解决方案1】：

好吧，我想通了。有兴趣的可以看一下代码：

ddply(t.data, .(scen,year,tech), mutate, 
           gen.diff.base = gen - t.data$gen[which(t.data$scen=='base' & t.data$year==year
                                              & t.data$tech==tech)])

【讨论】：