【发布时间】:2016-03-04 15:13:42
【问题描述】:
我现在一直在为我的时间序列横截面数据集苦苦挣扎,特别是在试图找到一种方法来定义每个国家和年份的一列的最大值时。我尝试了不同版本的 for 和 if/else 循环,但并没有真正成功。你能帮我找到任何线索吗?
这将是我的数据结构的一个可重现的小例子:
country <- c("a","a","a","a","a","a","b","b","b","b","b","b","c","c","c","c","c","c")
year <- c(2002, 2003, 2004, 2005, 2006, 2007, 2002, 2003, 2004, 2005, 2006, 2007, 2002, 2003, 2004, 2005, 2006, 2007)
topic <-c("u", "v", "w", "x","y","z","u", "v", "w", "x","y","z","u", "v", "w", "x","y","z")
perc <-c(0.3,0.4,0.1,0.2,0,0,0.2,0.3,0.1,0.1,0.1,0.2,0.1,0.2,0.2,0.3,0, 0.2)
dta <- data.frame(country, year, topic, perc)
最后,我想创建一个新变量来说明在给定年份和国家/地区中百分比最高的主题:
topicmax <-c("v","v","v","v","v","v","v","v","v","v","v","v","x","x","x","x","x","x")
最好我还会生成另一个变量,指定具有最高 perc 值的主题的确切百分比。
任何帮助将不胜感激。我发现的所有关于循环的教程都没有解决时间序列横截面问题......谢谢!
【问题讨论】:
-
提示:不要使用循环。查看 data.table 或 dplyr 等包。
-
我之前尝试过使用 data.table 但没有成功 - 这是我使用的代码 - 这有什么意义吗? ans = dta[, list(count=.N, mperc=max(perc)), keyby=list(country, year)]
标签: r loops panel-data