熔化和重塑具有相似列根词的列答案

【问题标题】：melt and reshape columns with similar column root word熔化和重塑具有相似列根词的列
【发布时间】：2016-11-11 05:35:37
【问题描述】：

我有一个如下的数据框

 id  gender group  Student_Math_1  Student_Math_2  Student_Read_1  Student_Read_2
 46  M      Red    23              45              37              56   
 46  M      Red    34              36              33              78 
 46  M      Red    56              63              58    
 62  F      Blue   59                                              68
 62  F      Blue                   68              87              73
 38  M      Red    78              57                              65
 38  M      Red                    75              54
 17  F      Blue   74                              56              72
 17  F      Blue   75              61                              79
 17  F      Blue                   74              43              81

    df = structure(list(id = c(46, 46, 46, 62, 62, 38, 38, 17, 17, 17), 
    gender = structure(c(2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 
    1L), .Label = c("F", "M"), class = "factor"), group = structure(c(2L, 
    2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L), .Label = c("Blue", "Red"
    ), class = "factor"), Student_Math_1 = c(23, 34, 56, 59, 
    NA, 78, NA, 74, 75, NA), Student_Math_2 = c(45, 36, 63, NA, 
    68, 57, 75, NA, 61, 74), Student_Read_1 = c(37, 33, 58, NA, 
    87, NA, 54, 56, NA, 43), Student_Read_2 = c(56, 78, NA, 68, 
    73, 65, NA, 72, 79, 81)), .Names = c("id", "gender", "group", 
"Student_Math_1", "Student_Math_2", "Student_Read_1", "Student_Read_2"
), row.names = c(NA, -10L), class = "data.frame")

我要做的是重塑这个数据集，使Student_Math_1 和Student_Math_2 列堆叠为单列Math 一个在另一个下方，同样Student_Read_1 和Student_Read_2 列堆叠向上作为单列Reading 如下所示

 id  gender group  Math Index1          Reading Index2

 46  M      Red    23  Student_Math_1   45     Student_Read_1           
 46  M      Red    34  Student_Math_1   36     Student_Read_1  
 46  M      Red    56  Student_Math_1   63     Student_Read_1  
 62  F      Blue   59  Student_Math_1          Student_Read_1                
 62  F      Blue       Student_Math_1   68     Student_Read_1     
 38  M      Red    78  Student_Math_1   57     Student_Read_1     
 38  M      Red        Student_Math_1   75     Student_Read_1   
 17  F      Blue   74  Student_Math_1          Student_Read_1                  
 17  F      Blue   75  Student_Math_1   61     Student_Read_1         
 17  F      Blue       Student_Math_1   74     Student_Read_1     

 46  M      Red    45  Student_Math_2   56     Student_Read_2
 46  M      Red    36  Student_Math_2   78     Student_Read_2 
 46  M      Red    63  Student_Math_2          Student_Read_2
 62  F      Blue       Student_Math_2   68     Student_Read_2
 62  F      Blue   68  Student_Math_2   73     Student_Read_2
 38  M      Red    57  Student_Math_2   65     Student_Read_2
 38  M      Red    75  Student_Math_2          Student_Read_2    
 17  F      Blue       Student_Math_2   72     Student_Read_2
 17  F      Blue   61  Student_Math_2   79     Student_Read_2
 17  F      Blue   74  Student_Math_2   81     Student_Read_2

只知道这可以通过重塑或融化以及从宽格式更改为长格式来实现，不知道如何继续。非常感谢您对实现这一转变的任何帮助。

【问题讨论】：

您可以在base R 即pat <- c("Student_Math", "Student_Read");cbind(df[rep(1:nrow(df), 2), 1:3], do.call(cbind, lapply(pat, function(nm) melt(df[grep(nm, names(df))])))) 中执行此操作并更改列名
或者另一个选项是melt from data.table melt(setDT(df), measure = patterns("Math", "Read"), value.name = c("Math", "Read"))[, Index1 := names(df)[4:5][variable]][, Index2 := names(df)[5:6][variable]][]

标签： r dplyr reshape2 melt

【解决方案1】：

我们可以从data.table使用melt

library(data.table)
melt(setDT(df), measure = patterns("Math", "Read"), 
value.name = c("Math", "Read"))[, Index1 := names(df)[4:5][variable]
            ][, Index2 := names(df)[5:6][variable]][]

或者另一种选择是

pat <- c("Student_Math", "Student_Read")
cbind(df[rep(1:nrow(df), 2), 1:3], do.call(cbind, lapply(pat,
          function(nm) melt(df[grep(nm, names(df))]))))

【讨论】：

【解决方案2】：

使用来自reshape2 的melt 并将适当的输入传递给id、measure.vars：

MathDF = melt(data = DF,id=c("id","gender","group"),measure.vars = c("Student_Math_1","Student_Math_2"),value.name = "Math",
    variable.name = "Index1")

ReadDF = melt(data = DF,id=c("id","gender","group"),measure.vars = c("Student_Read_1","Student_Read_2"),value.name = "Read",
    variable.name = "Index2")


mergeDF = merge(MathDF,ReadDF,by=c("id","gender","group"))

head(mergeDF)
# id gender group         Index1 Math         Index2 Read
# 1 46      M   Red Student_Math_1   23 Student_Read_1   37
# 2 46      M   Red Student_Math_1   23 Student_Read_1   33
# 3 46      M   Red Student_Math_1   23 Student_Read_1   58
# 4 46      M   Red Student_Math_1   23 Student_Read_2   78
# 5 46      M   Red Student_Math_1   23 Student_Read_2   NA
# 6 46      M   Red Student_Math_1   23 Student_Read_2   56

【讨论】：

【解决方案3】：

使用 tidyverse，您可以 gather 每组列，然后 filter 删除索引数量不匹配的任何值（假设您不想要 Student_*_1 和 Student_*_2 组合）：

library(tidyverse)

df %>% gather(Index1, Math, contains('Math')) %>% 
    gather(Index2, Reading, contains('Read')) %>% 
    filter(parse_number(Index1) == parse_number(Index2))

##    id gender group         Index1 Math         Index2 Reading
## 1  46      M   Red Student_Math_1   23 Student_Read_1      37
## 2  46      M   Red Student_Math_1   34 Student_Read_1      33
## 3  46      M   Red Student_Math_1   56 Student_Read_1      58
## 4  62      F  Blue Student_Math_1   59 Student_Read_1      NA
## 5  62      F  Blue Student_Math_1   NA Student_Read_1      87
## 6  38      M   Red Student_Math_1   78 Student_Read_1      NA
## 7  38      M   Red Student_Math_1   NA Student_Read_1      54
## 8  17      F  Blue Student_Math_1   74 Student_Read_1      56
## 9  17      F  Blue Student_Math_1   75 Student_Read_1      NA
## 10 17      F  Blue Student_Math_1   NA Student_Read_1      43
## 11 46      M   Red Student_Math_2   45 Student_Read_2      56
## 12 46      M   Red Student_Math_2   36 Student_Read_2      78
## 13 46      M   Red Student_Math_2   63 Student_Read_2      NA
## 14 62      F  Blue Student_Math_2   NA Student_Read_2      68
## 15 62      F  Blue Student_Math_2   68 Student_Read_2      73
## 16 38      M   Red Student_Math_2   57 Student_Read_2      65
## 17 38      M   Red Student_Math_2   75 Student_Read_2      NA
## 18 17      F  Blue Student_Math_2   NA Student_Read_2      72
## 19 17      F  Blue Student_Math_2   61 Student_Read_2      79
## 20 17      F  Blue Student_Math_2   74 Student_Read_2      81

【讨论】：