【问题标题】:Variables in one col; values in another->goal: add columns for variables一列中的变量;另一个->目标中的值:为变量添加列
【发布时间】:2016-12-22 21:11:43
【问题描述】:

我认为我面临一个(希望是)小问题,但搜索功能没有为我提供任何帮助。我在通过 OECD 软件包提取数据时遇到问题。问题是,我得到了一个数据集,其中所有变量都存储在一个列中。数据集采用长格式,这很好,但我希望变量成为单列。目前数据集如下所示:

如您所见,“VAR”列包含多个变量:“B11”、“B12”...总共 11 个变量。测量了许多国家的所有变量(Col“COU”)。我想做的是,向数据集添加新列,这些列代表现在存储在“VAR”中的单个变量并包含“obsValue”列的相应值?

这样我就可以看到 B11 的值,例如阿富汗 1999 年在一行中,2000 年在另一行中,但 1999 年 B12 的值与 B11 的值在同一行中,依此类推。我希望我的目标越来越明确,如果没有,请不要犹豫。

这是重现数据集头部的代码:

dput(head(MIG,20)) 

structure(list(CO2 = c("AFG", "AFG", "AFG", "AFG", "AFG", "AFG", 
"AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", 
"AFG", "AFG", "AFG", "AFG", "AFG"), VAR = c("B11", "B11", "B11", 
"B11", "B11", "B11", "B11", "B11", "B11", "B11", "B11", "B11", 
"B11", "B11", "B11", "B11", "B12", "B12", "B12", "B12"), GEN = c("WMN", 
"WMN", "WMN", "WMN", "WMN", "WMN", "WMN", "WMN", "WMN", "WMN", 
"WMN", "WMN", "WMN", "WMN", "WMN", "WMN", "WMN", "WMN", "WMN", 
"WMN"), COU = c("AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", 
"AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", 
"AUS", "AUS", "AUS", "AUS"), TIME_FORMAT = c("P1Y", "P1Y", "P1Y", 
"P1Y", "P1Y", "P1Y", "P1Y", "P1Y", "P1Y", "P1Y", "P1Y", "P1Y", 
"P1Y", "P1Y", "P1Y", "P1Y", "P1Y", "P1Y", "P1Y", "P1Y"), obsTime = c("1999", 
"2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007", 
"2008", "2009", "2010", "2011", "2012", "2013", "2014", "1999", 
"2000", "2001", "2004"), obsValue = c(434, 398, 225, 345, 544, 
726, 1099, 1607, 1377, 1018, 946, 873, 1131, 903, 1230, 2939, 
0, 0, 2, 24), OBS_STATUS = c(NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_), migrants = c(434, 398, 225, 345, 
544, 726, 1099, 1607, 1377, 1018, 946, 873, 1131, 903, 1230, 
2939, 0, 0, 2, 24)), .Names = c("CO2", "VAR", "GEN", "COU", "TIME_FORMAT", 
"obsTime", "obsValue", "OBS_STATUS", "migrants"), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame"))

这是我的整个代码,包括我自己解决问题的两次尝试,但它们不起作用,因为它们只是复制“obsValue”列或给我一个显示 TRUE 或 FALSE 的列。请注意,R 将需要大量时间来加载数据集。

library(OECD)
library(plyr)
library(dplyr)

search_dataset("migration")
MIG<- get_dataset("MIG")
get_data_structure("MIG")

MIG$migrants <- if(MIG$VAR == "B11")MIG$migrants<-MIG$obsValue else MIG$migrants<-NA


MIG_long <- mutate(MIG,migrants=VAR=="B11")
if(MIG_long$migrants==T)MIG_long$migrants<-MIG_long$obsValue else MIG_long$migrants<-NA

我希望这个问题对您来说不是太低,并且您可以根据我的解释“工作”。不过,如果您有任何问题,请问我。

最好的祝愿, 马塞尔

【问题讨论】:

    标签: r data-structures dplyr plyr


    【解决方案1】:

    您可以在列中使用tidyrspreadVARobsValue。如果您确实希望每行一年,正如@atireto 突出显示的那样,您只需删除migrants 列即可获得每年的唯一值。

    library(tidyr)
    library(dplyr)
    
    MIG %>% 
      select(-migrants) %>%
      spread(VAR, obsValue)
    
         CO2 obsTime   B11   B12
       (chr)   (chr) (dbl) (dbl)
    1    AFG    1999   434     0
    2    AFG    2000   398     0
    3    AFG    2001   225     2
    4    AFG    2002   345    NA
    5    AFG    2003   544    NA
    6    AFG    2004   726    24
    7    AFG    2005  1099    NA
    8    AFG    2006  1607    NA
    9    AFG    2007  1377    NA
    10   AFG    2008  1018    NA
    11   AFG    2009   946    NA
    12   AFG    2010   873    NA
    13   AFG    2011  1131    NA
    14   AFG    2012   903    NA
    15   AFG    2013  1230    NA
    16   AFG    2014  2939    NA
    

    【讨论】:

    • 这很接近,但 OP 每年只想要 1 行?
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2021-04-26
    • 2013-08-06
    • 1970-01-01
    • 2013-09-14
    • 1970-01-01
    • 2020-07-07
    • 1970-01-01
    相关资源
    最近更新 更多