【问题标题】:R - Reshape data set to include values for all conditions, even if they are zero [duplicate]R - 重塑数据集以包含所有条件的值,即使它们为零 [重复]
【发布时间】:2016-10-15 09:57:44
【问题描述】:

我有这个简短的数据框:

population.served <- c(200:210)
area <- c("Cambridge", "Oxford","Cambridge", "Oxford", "Cambridge", "Oxford","London","Cambridge", "Oxford", "London","Edinburgh")
year <- c("Year.1", "Year.1","Year.2", "Year.2","Year.3", "Year.3","Year.3", "Year.4", "Year.4","Year.4","Year.4" )
data <- data.frame(population.served, area, year) 

如何使所有区域和年份条目都包含一个 population.served 的值,即使它们是零值?

我希望数据如下所示:

population.served <- c(200, 201, 0, 0, 202, 203, 0, 0, 204, 205, 206, 0, 207, 208, 209, 210)
area <- c("Cambridge", "Oxford","London","Edinburgh", "Cambridge", "Oxford","London","Edinburgh","Cambridge", "Oxford","London","Edinburgh","Cambridge", "Oxford","London","Edinburgh")
year <- c("Year.1", "Year.1","Year.1", "Year.1","Year.2", "Year.2","Year.2", "Year.2","Year.3", "Year.3","Year.3", "Year.3","Year.4", "Year.4","Year.4","Year.4" )
data2 <- data.frame(population.served, area, year) 

【问题讨论】:

    标签: r dataframe


    【解决方案1】:

    您可以使用 complete 包中的 tidyr

    library("tidyr")
    data %>% complete(area, year, fill = list(population.served = 0))
    # # A tibble: 16 × 3
    #         area   year population.served
    #       <fctr> <fctr>             <dbl>
    # 1  Cambridge Year.1               200
    # 2  Cambridge Year.2               202
    # 3  Cambridge Year.3               204
    # 4  Cambridge Year.4               207
    # 5  Edinburgh Year.1                 0
    # 6  Edinburgh Year.2                 0
    # 7  Edinburgh Year.3                 0
    # 8  Edinburgh Year.4               210
    # .....
    

    【讨论】:

      【解决方案2】:

      这是一种方法,使用 base R 中的 expand.grid 来填写您的表格:

      # make a dummy table with all time steps for all units
      DF <- with(data, expand.grid(area = unique(area), year = unique(year)))
      
      # merge the data with that table, using all.x = TRUE to keep the larger set
      DF <- merge(DF, data, all.x = TRUE)
      
      # replace the NAs in the expanded data frame with 0s
      DF[is.na(DF)] = 0
      

      【讨论】:

        【解决方案3】:

        使用快速data.table 包的方法:

        library(data.table)
        setDT(data)[CJ(area = area, year = year, unique = TRUE), on = c('area', 'year')
                    ][is.na(population.served), population.served := 0][]
        

        结果是:

            population.served      area   year
         1:               200 Cambridge Year.1
         2:               202 Cambridge Year.2
         3:               204 Cambridge Year.3
         4:               207 Cambridge Year.4
         5:                 0 Edinburgh Year.1
         6:                 0 Edinburgh Year.2
         7:                 0 Edinburgh Year.3
         8:               210 Edinburgh Year.4
         9:                 0    London Year.1
        10:                 0    London Year.2
        11:               206    London Year.3
        12:               209    London Year.4
        13:               201    Oxford Year.1
        14:               203    Oxford Year.2
        15:               205    Oxford Year.3
        16:               208    Oxford Year.4
        

        【讨论】:

          猜你喜欢
          • 2020-01-08
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2014-09-17
          相关资源
          最近更新 更多