【发布时间】:2016-12-22 21:11:43
【问题描述】:
我认为我面临一个(希望是)小问题,但搜索功能没有为我提供任何帮助。我在通过 OECD 软件包提取数据时遇到问题。问题是,我得到了一个数据集,其中所有变量都存储在一个列中。数据集采用长格式,这很好,但我希望变量成为单列。目前数据集如下所示:
如您所见,“VAR”列包含多个变量:“B11”、“B12”...总共 11 个变量。测量了许多国家的所有变量(Col“COU”)。我想做的是,向数据集添加新列,这些列代表现在存储在“VAR”中的单个变量并包含“obsValue”列的相应值?
这样我就可以看到 B11 的值,例如阿富汗 1999 年在一行中,2000 年在另一行中,但 1999 年 B12 的值与 B11 的值在同一行中,依此类推。我希望我的目标越来越明确,如果没有,请不要犹豫。
这是重现数据集头部的代码:
dput(head(MIG,20))
structure(list(CO2 = c("AFG", "AFG", "AFG", "AFG", "AFG", "AFG",
"AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG",
"AFG", "AFG", "AFG", "AFG", "AFG"), VAR = c("B11", "B11", "B11",
"B11", "B11", "B11", "B11", "B11", "B11", "B11", "B11", "B11",
"B11", "B11", "B11", "B11", "B12", "B12", "B12", "B12"), GEN = c("WMN",
"WMN", "WMN", "WMN", "WMN", "WMN", "WMN", "WMN", "WMN", "WMN",
"WMN", "WMN", "WMN", "WMN", "WMN", "WMN", "WMN", "WMN", "WMN",
"WMN"), COU = c("AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS",
"AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS",
"AUS", "AUS", "AUS", "AUS"), TIME_FORMAT = c("P1Y", "P1Y", "P1Y",
"P1Y", "P1Y", "P1Y", "P1Y", "P1Y", "P1Y", "P1Y", "P1Y", "P1Y",
"P1Y", "P1Y", "P1Y", "P1Y", "P1Y", "P1Y", "P1Y", "P1Y"), obsTime = c("1999",
"2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007",
"2008", "2009", "2010", "2011", "2012", "2013", "2014", "1999",
"2000", "2001", "2004"), obsValue = c(434, 398, 225, 345, 544,
726, 1099, 1607, 1377, 1018, 946, 873, 1131, 903, 1230, 2939,
0, 0, 2, 24), OBS_STATUS = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_), migrants = c(434, 398, 225, 345,
544, 726, 1099, 1607, 1377, 1018, 946, 873, 1131, 903, 1230,
2939, 0, 0, 2, 24)), .Names = c("CO2", "VAR", "GEN", "COU", "TIME_FORMAT",
"obsTime", "obsValue", "OBS_STATUS", "migrants"), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"))
这是我的整个代码,包括我自己解决问题的两次尝试,但它们不起作用,因为它们只是复制“obsValue”列或给我一个显示 TRUE 或 FALSE 的列。请注意,R 将需要大量时间来加载数据集。
library(OECD)
library(plyr)
library(dplyr)
search_dataset("migration")
MIG<- get_dataset("MIG")
get_data_structure("MIG")
MIG$migrants <- if(MIG$VAR == "B11")MIG$migrants<-MIG$obsValue else MIG$migrants<-NA
MIG_long <- mutate(MIG,migrants=VAR=="B11")
if(MIG_long$migrants==T)MIG_long$migrants<-MIG_long$obsValue else MIG_long$migrants<-NA
我希望这个问题对您来说不是太低,并且您可以根据我的解释“工作”。不过,如果您有任何问题,请问我。
最好的祝愿, 马塞尔
【问题讨论】:
标签: r data-structures dplyr plyr