【问题标题】:R Melt reshape dataR 熔化重塑数据
【发布时间】:2018-08-07 22:33:52
【问题描述】:

这是我的数据:

Day        Morning_1_id     Var1        Morning_2_id     Var2      Afternoon_1_id     Var3      Afternoon_2_id     Var4
1     20180501-033-000001 3.156667 20180501-033-000002 2.866667 20180501-033-000008 2.946667 20180501-033-000009 3.133333
2     20180502-033-000001 2.986667 20180502-033-000002 2.930000 20180502-033-000020 3.076667 20180502-033-000021 3.013333
3     20180503-033-000001 3.073333 20180503-033-000002 3.070000 20180503-033-000011 3.106667 20180503-033-000012 2.900000
4     20180507-033-000001 3.236667 20180507-033-000002 2.990000 20180507-033-000015 3.043333 20180507-033-000016 3.116667
5     20180508-033-000001 3.030000 20180508-033-000002 3.150000 20180508-033-000015 3.156667 20180508-033-000017 3.343333
6     20180509-033-000001 3.010000 20180509-033-000002 3.020000 20180509-033-000007 3.000000 20180509-033-000008 3.156667
7     20180510-033-000001 2.916667 20180510-033-000002 3.103333 20180510-033-000007 3.336667 20180510-033-000008 3.066667
8     20180511-033-000001 3.293333 20180511-033-000002 3.163333 20180511-033-000013 2.980000 20180511-033-000014 2.940000
9     20180514-033-000001 3.136667 20180514-033-000002 3.186667 20180514-033-000007 2.766667 20180514-033-000008 3.100000
10    20180516-033-000001 3.116667 20180516-033-000002 3.283333 20180516-033-000008 3.133333 20180516-033-000009 3.040000
11    20180517-033-000003 2.843333 20180517-033-000004 3.120000 20180517-033-000008 3.060000 20180517-033-000009 3.033333
12    20180518-033-000001 3.033333 20180518-033-000002 3.290000 20180518-033-000007 3.006667 20180518-033-000008 2.973333
13    20180521-033-000002 3.173333 20180521-033-000003 2.993333 20180521-033-000008 2.983333 20180521-033-000009 3.020000
14    20180523-033-000001 3.336667 20180523-033-000002 3.026667 20180523-033-000007 3.300000 20180523-033-000008 3.210000

可重现的形式:

structure(list(Day = 1:14, Morning_1_id = structure(1:14, .Label = c("20180501-033-000001", 
"20180502-033-000001", "20180503-033-000001", "20180507-033-000001", 
"20180508-033-000001", "20180509-033-000001", "20180510-033-000001", 
"20180511-033-000001", "20180514-033-000001", "20180516-033-000001", 
"20180517-033-000003", "20180518-033-000001", "20180521-033-000002", 
"20180523-033-000001"), class = "factor"), Var1 = c(3.156666667, 
2.986666667, 3.073333333, 3.236666667, 3.03, 3.01, 2.916666667, 
3.293333333, 3.136666667, 3.116666667, 2.843333333, 3.033333333, 
3.173333333, 3.336666667), Morning_2_id = structure(1:14, .Label = c("20180501-033-000002", 
"20180502-033-000002", "20180503-033-000002", "20180507-033-000002", 
"20180508-033-000002", "20180509-033-000002", "20180510-033-000002", 
"20180511-033-000002", "20180514-033-000002", "20180516-033-000002", 
"20180517-033-000004", "20180518-033-000002", "20180521-033-000003", 
"20180523-033-000002"), class = "factor"), Var2 = c(2.866666667, 
2.93, 3.07, 2.99, 3.15, 3.02, 3.103333333, 3.163333333, 3.186666667, 
3.283333333, 3.12, 3.29, 2.993333333, 3.026666667), Afternoon_1_id = structure(1:14, .Label = c("20180501-033-000008", 
"20180502-033-000020", "20180503-033-000011", "20180507-033-000015", 
"20180508-033-000015", "20180509-033-000007", "20180510-033-000007", 
"20180511-033-000013", "20180514-033-000007", "20180516-033-000008", 
"20180517-033-000008", "20180518-033-000007", "20180521-033-000008", 
"20180523-033-000007"), class = "factor"), Var3 = c(2.946666667, 
3.076666667, 3.106666667, 3.043333333, 3.156666667, 3, 3.336666667, 
2.98, 2.766666667, 3.133333333, 3.06, 3.006666667, 2.983333333, 
3.3), Afternoon_2_id = structure(1:14, .Label = c("20180501-033-000009", 
"20180502-033-000021", "20180503-033-000012", "20180507-033-000016", 
"20180508-033-000017", "20180509-033-000008", "20180510-033-000008", 
"20180511-033-000014", "20180514-033-000008", "20180516-033-000009", 
"20180517-033-000009", "20180518-033-000008", "20180521-033-000009", 
"20180523-033-000008"), class = "factor"), Var4 = c(3.133333333, 
3.013333333, 2.9, 3.116666667, 3.343333333, 3.156666667, 3.066666667, 
2.94, 3.1, 3.04, 3.033333333, 2.973333333, 3.02, 3.21)), class = "data.frame", row.names = c(NA, 
-14L))

这就是我想要的:

Day Id                  Var         Time
1   20180501-033-000001 3.156666667 Morning1
2   20180502-033-000001 2.986666667 Morning1
3   20180503-033-000001 3.073333333 Morning1
4   20180507-033-000001 3.236666667 Morning1
5   20180508-033-000001 3.03        Morning1
6   20180509-033-000001 3.01        Morning1
7   20180510-033-000001 2.916666667 Morning1
8   20180511-033-000001 3.293333333 Morning1
9   20180514-033-000001 3.136666667 Morning1
10  20180516-033-000001 3.116666667 Morning1
11  20180517-033-000003 2.843333333 Morning1
12  20180518-033-000001 3.033333333 Morning1
13  20180521-033-000002 3.173333333 Morning1
14  20180523-033-000001 3.336666667 Morning1
1   20180501-033-000002 2.866666667 Morning2
2   20180502-033-000002 2.93        Morning2
3   20180503-033-000002 3.07        Morning2
4   20180507-033-000002 2.99        Morning2
5   20180508-033-000002 3.15        Morning2
6   20180509-033-000002 3.02        Morning2
7   20180510-033-000002 3.103333333 Morning2
8   20180511-033-000002 3.163333333 Morning2
9   20180514-033-000002 3.186666667 Morning2
10  20180516-033-000002 3.283333333 Morning2
11  20180517-033-000004 3.12        Morning2
12  20180518-033-000002 3.29        Morning2
13  20180521-033-000003 2.993333333 Morning2
14  20180523-033-000002 3.026666667 Morning2
1   20180501-033-000008 2.946666667 Afternoon1
2   20180502-033-000020 3.076666667 Afternoon1
3   20180503-033-000011 3.106666667 Afternoon1
4   20180507-033-000015 3.043333333 Afternoon1
5   20180508-033-000015 3.156666667 Afternoon1
6   20180509-033-000007 3           Afternoon1
7   20180510-033-000007 3.336666667 Afternoon1
8   20180511-033-000013 2.98        Afternoon1
9   20180514-033-000007 2.766666667 Afternoon1
10  20180516-033-000008 3.133333333 Afternoon1
11  20180517-033-000008 3.06        Afternoon1
12  20180518-033-000007 3.006666667 Afternoon1
13  20180521-033-000008 2.983333333 Afternoon1
14  20180523-033-000007 3.3         Afternoon1
1   20180501-033-000009 3.133333333 Afternoon2
2   20180502-033-000021 3.013333333 Afternoon2
3   20180503-033-000012 2.9         Afternoon2
4   20180507-033-000016 3.116666667 Afternoon2
5   20180508-033-000017 3.343333333 Afternoon2
6   20180509-033-000008 3.156666667 Afternoon2
7   20180510-033-000008 3.066666667 Afternoon2
8   20180511-033-000014 2.94        Afternoon2
9   20180514-033-000008 3.1         Afternoon2
10  20180516-033-000009 3.04        Afternoon2
11  20180517-033-000009 3.033333333 Afternoon2
12  20180518-033-000008 2.973333333 Afternoon2
13  20180521-033-000009 3.02        Afternoon2
14  20180523-033-000008 3.21        Afternoon2

我想进行从宽到长的转换,以便 Id 和 'Var' 的值按天堆叠。我还想要一个名为“时间”的附加列,它取决于初始 ID,即“Morning_1_id”、“Morning_2_id”、“Afternoon_1_id”和“Afternoon_2_id”。这该怎么做?我尝试使用 reshape2 中的 melt 但无法完成。

【问题讨论】:

    标签: r reshape reshape2


    【解决方案1】:

    这是使用dplyr 将表格转换为请求格式的解决方案:

    library(dplyr)
    
    mydata<- reshape(mydata, direction='long', 
                    varying=c('Morning_1_id', 'Var1', 'Morning_2_id', 'Var2', 'Afternoon_1_id', 'Var3', 'Afternoon_2_id', 'Var4'), 
                    timevar='Var',
                    times=c('Morning1', 'Morning2', 'Afternoon1', 'Afternoon2'),
                    v.names=c('Id', 'Var'),
                    idvar='Day')
    
    mydata<- tibble::rownames_to_column(mydata)
    mydata$rowname<- gsub("^.*\\.","", mydata$rowname)
    names(mydata)<- c("Time", "Day", "Var", "Id")
    mydata<- mydata[,c(2,4,3,1)]
    

    【讨论】:

    • reshape 是一个基本的 R 函数。您没有使用 reshape2 ,它使用 melt 进行长格式。您甚至可以删除 tibble 调用:my_data$Time &lt;- gsub(".*\\.", "", row.names(mydata)); row.names(mydata) &lt;- NULL
    【解决方案2】:

    通过构建序列的每第二列的列表,然后绑定所有 df 元素的行来考虑 base R:

    df_list <- lapply(seq(3, length(df), 2), function(i) {
      sub <- df[c(1, (i-1):i)]                                      # SUBSET BY COLS
      sub <- transform(sub, Time = sub("_id", "", names(df)[i-1]))  # ADD TIME VAR
      setNames(sub, c("Day", "Id", "Var", "Time"))                  # RENAME COLS  
    })
    
    long_df <- do.call(rbind, df_list)
    
    head(long_df, 20)    
    #    Day                  Id      Var      Time
    # 1    1 20180501-033-000001 3.156667 Morning_1
    # 2    2 20180502-033-000001 2.986667 Morning_1
    # 3    3 20180503-033-000001 3.073333 Morning_1
    # 4    4 20180507-033-000001 3.236667 Morning_1
    # 5    5 20180508-033-000001 3.030000 Morning_1
    # 6    6 20180509-033-000001 3.010000 Morning_1
    # 7    7 20180510-033-000001 2.916667 Morning_1
    # 8    8 20180511-033-000001 3.293333 Morning_1
    # 9    9 20180514-033-000001 3.136667 Morning_1
    # 10  10 20180516-033-000001 3.116667 Morning_1
    # 11  11 20180517-033-000003 2.843333 Morning_1
    # 12  12 20180518-033-000001 3.033333 Morning_1
    # 13  13 20180521-033-000002 3.173333 Morning_1
    # 14  14 20180523-033-000001 3.336667 Morning_1
    # 15   1 20180501-033-000002 2.866667 Morning_2
    # 16   2 20180502-033-000002 2.930000 Morning_2
    # 17   3 20180503-033-000002 3.070000 Morning_2
    # 18   4 20180507-033-000002 2.990000 Morning_2
    # 19   5 20180508-033-000002 3.150000 Morning_2
    # 20   6 20180509-033-000002 3.020000 Morning_2
    

    【讨论】:

      【解决方案3】:

      这是一个tidyverse 选项

      来自@Calum You 的每 cmets 更正

      df %>%
        gather(Time, Var, -Day, -c(Var1, Var2, Var3, Var4)) %>%
        mutate(Time = gsub('.{3}$', '',Time),
               start = substr(Time, 1, 1),
               end = substr(Time, nchar(Time), nchar(Time)),
               id = paste0(start,end),
               Val = case_when(id=='M1' ~ Var1,
                               id=='M2' ~ Var2,
                               id=='A1' ~ Var3,
                               id=='A2' ~ Var4)) %>% 
        dplyr::select(Day, Id=Var, Val, Time)
      

      原错误码

      df %>%
         gather(Time, Var, -Day, -c(Var1, Var2, Var3, Var4)) %>%
         gather( key, value, -Day, -Time, -Var) %>% 
         mutate(Time = gsub('.{3}$', '',Time)) %>% 
         dplyr::select(Day, Id=Var, Var=value, Time)
      

      【讨论】:

      • 我认为这两个 gather 在这里产生了很多不正确的 ID-Var 组合。输出应该只有 56 行
      【解决方案4】:

      这是另一个tidyverse 方法。这很复杂,因为不同的Var 列对应于特定时间,但时间的指示与id 列中表示的方式不同。所以你需要有一些方法来匹配这两者。在这里,我使用var_renamer 中的命名列表来执行此操作。一旦列被一致命名,就可以使用gatherseparate 生成正确的变量,将spread 恢复为所需的格式。请注意,我将 mutate Time 转换为有序因子,因此它可以按时间排序,而不是使用 arrange 按字母顺序排序。

      df <- structure(list(Day = 1:14, Morning_1_id = structure(1:14, .Label = c("20180501-033-000001", "20180502-033-000001", "20180503-033-000001", "20180507-033-000001", "20180508-033-000001", "20180509-033-000001", "20180510-033-000001", "20180511-033-000001", "20180514-033-000001", "20180516-033-000001", "20180517-033-000003", "20180518-033-000001", "20180521-033-000002", "20180523-033-000001"), class = "factor"), Var1 = c(3.156666667, 2.986666667, 3.073333333, 3.236666667, 3.03, 3.01, 2.916666667, 3.293333333, 3.136666667, 3.116666667, 2.843333333, 3.033333333, 3.173333333, 3.336666667), Morning_2_id = structure(1:14, .Label = c("20180501-033-000002", "20180502-033-000002", "20180503-033-000002", "20180507-033-000002", "20180508-033-000002", "20180509-033-000002", "20180510-033-000002", "20180511-033-000002", "20180514-033-000002", "20180516-033-000002", "20180517-033-000004", "20180518-033-000002", "20180521-033-000003", "20180523-033-000002"), class = "factor"), Var2 = c(2.866666667, 2.93, 3.07, 2.99, 3.15, 3.02, 3.103333333, 3.163333333, 3.186666667, 3.283333333, 3.12, 3.29, 2.993333333, 3.026666667), Afternoon_1_id = structure(1:14, .Label = c("20180501-033-000008", "20180502-033-000020", "20180503-033-000011", "20180507-033-000015", "20180508-033-000015", "20180509-033-000007", "20180510-033-000007", "20180511-033-000013", "20180514-033-000007", "20180516-033-000008", "20180517-033-000008", "20180518-033-000007", "20180521-033-000008", "20180523-033-000007"), class = "factor"), Var3 = c(2.946666667, 3.076666667, 3.106666667, 3.043333333, 3.156666667, 3, 3.336666667, 2.98, 2.766666667, 3.133333333, 3.06, 3.006666667, 2.983333333, 3.3), Afternoon_2_id = structure(1:14, .Label = c("20180501-033-000009", "20180502-033-000021", "20180503-033-000012", "20180507-033-000016", "20180508-033-000017", "20180509-033-000008", "20180510-033-000008", "20180511-033-000014", "20180514-033-000008", "20180516-033-000009", "20180517-033-000009", "20180518-033-000008", "20180521-033-000009", "20180523-033-000008"), class = "factor"), Var4 = c(3.133333333, 3.013333333, 2.9, 3.116666667, 3.343333333, 3.156666667, 3.066666667, 2.94, 3.1, 3.04, 3.033333333, 2.973333333, 3.02, 3.21)), class = "data.frame", row.names = c(NA, -14L))
      
      library(tidyverse)
      var_renamer <- function(name) {
        time_list <- list(
          "1" = "Morning_1", "2" = "Morning_2", "3" = "Afternoon_1", "4" = "Afternoon_2"
        )
        timenum = str_remove(name, "Var")
        timestr = map_chr(timenum, ~ time_list[[.x]])
        str_c(timestr, "-Var")
      }
      
      df %>%
        rename_at(vars(starts_with("Var")), var_renamer) %>%
        rename_all(funs(str_replace(., "_id", "-Id"))) %>%
        gather(colname, val, -Day) %>%
        separate(colname, c("Time", "id_var"), sep = "-") %>%
        mutate(Time = factor(
          x = Time,
          levels = c("Morning_1", "Morning_2", "Afternoon_1", "Afternoon_2"),
          ordered = TRUE
        )) %>%
        spread(id_var, val) %>%
        arrange(Time, Day)
      #> Warning: attributes are not identical across measure variables;
      #> they will be dropped
      #>    Day        Time                  Id         Var
      #> 1    1   Morning_1 20180501-033-000001 3.156666667
      #> 2    2   Morning_1 20180502-033-000001 2.986666667
      #> 3    3   Morning_1 20180503-033-000001 3.073333333
      #> 4    4   Morning_1 20180507-033-000001 3.236666667
      #> 5    5   Morning_1 20180508-033-000001        3.03
      #> 6    6   Morning_1 20180509-033-000001        3.01
      #> 7    7   Morning_1 20180510-033-000001 2.916666667
      #> 8    8   Morning_1 20180511-033-000001 3.293333333
      #> 9    9   Morning_1 20180514-033-000001 3.136666667
      #> 10  10   Morning_1 20180516-033-000001 3.116666667
      #> 11  11   Morning_1 20180517-033-000003 2.843333333
      #> 12  12   Morning_1 20180518-033-000001 3.033333333
      #> 13  13   Morning_1 20180521-033-000002 3.173333333
      #> 14  14   Morning_1 20180523-033-000001 3.336666667
      #> 15   1   Morning_2 20180501-033-000002 2.866666667
      #> 16   2   Morning_2 20180502-033-000002        2.93
      #> 17   3   Morning_2 20180503-033-000002        3.07
      #> 18   4   Morning_2 20180507-033-000002        2.99
      #> 19   5   Morning_2 20180508-033-000002        3.15
      #> 20   6   Morning_2 20180509-033-000002        3.02
      #> 21   7   Morning_2 20180510-033-000002 3.103333333
      #> 22   8   Morning_2 20180511-033-000002 3.163333333
      #> 23   9   Morning_2 20180514-033-000002 3.186666667
      #> 24  10   Morning_2 20180516-033-000002 3.283333333
      #> 25  11   Morning_2 20180517-033-000004        3.12
      #> 26  12   Morning_2 20180518-033-000002        3.29
      #> 27  13   Morning_2 20180521-033-000003 2.993333333
      #> 28  14   Morning_2 20180523-033-000002 3.026666667
      #> 29   1 Afternoon_1 20180501-033-000008 2.946666667
      #> 30   2 Afternoon_1 20180502-033-000020 3.076666667
      #> 31   3 Afternoon_1 20180503-033-000011 3.106666667
      #> 32   4 Afternoon_1 20180507-033-000015 3.043333333
      #> 33   5 Afternoon_1 20180508-033-000015 3.156666667
      #> 34   6 Afternoon_1 20180509-033-000007           3
      #> 35   7 Afternoon_1 20180510-033-000007 3.336666667
      #> 36   8 Afternoon_1 20180511-033-000013        2.98
      #> 37   9 Afternoon_1 20180514-033-000007 2.766666667
      #> 38  10 Afternoon_1 20180516-033-000008 3.133333333
      #> 39  11 Afternoon_1 20180517-033-000008        3.06
      #> 40  12 Afternoon_1 20180518-033-000007 3.006666667
      #> 41  13 Afternoon_1 20180521-033-000008 2.983333333
      #> 42  14 Afternoon_1 20180523-033-000007         3.3
      #> 43   1 Afternoon_2 20180501-033-000009 3.133333333
      #> 44   2 Afternoon_2 20180502-033-000021 3.013333333
      #> 45   3 Afternoon_2 20180503-033-000012         2.9
      #> 46   4 Afternoon_2 20180507-033-000016 3.116666667
      #> 47   5 Afternoon_2 20180508-033-000017 3.343333333
      #> 48   6 Afternoon_2 20180509-033-000008 3.156666667
      #> 49   7 Afternoon_2 20180510-033-000008 3.066666667
      #> 50   8 Afternoon_2 20180511-033-000014        2.94
      #> 51   9 Afternoon_2 20180514-033-000008         3.1
      #> 52  10 Afternoon_2 20180516-033-000009        3.04
      #> 53  11 Afternoon_2 20180517-033-000009 3.033333333
      #> 54  12 Afternoon_2 20180518-033-000008 2.973333333
      #> 55  13 Afternoon_2 20180521-033-000009        3.02
      #> 56  14 Afternoon_2 20180523-033-000008        3.21
      

      reprex package (v0.2.0) 于 2018 年 8 月 7 日创建。

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2012-12-20
        • 1970-01-01
        • 2017-06-24
        • 2016-12-18
        • 1970-01-01
        • 2018-04-15
        • 2019-11-14
        相关资源
        最近更新 更多