重塑与 tidyr 用于具有多个因变量的重复测量答案

【问题标题】：Reshape vs. tidyr for repeated measures with multiple dependent variables重塑与 tidyr 用于具有多个因变量的重复测量
【发布时间】：2015-07-14 13:32:25
【问题描述】：

我有以下 10 个案例的样本数据，对两个因变量“Rapport”和“STRS”进行了三个重复测量：

structure(list(SubID = structure(1:10, .Label = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
"16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", 
"27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", 
"38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48", 
"49", "50", "51", "52", "53", "54", "55", "56", "57", "58", "59", 
"60", "61", "62", "63", "64", "65", "66", "67", "68", "69", "70", 
"71", "72", "73", "74", "75", "76", "77", "78", "79", "80", "81", 
"82", "83", "84"), class = "factor"), Gender = structure(c(3L, 
2L, 3L, 2L, 3L, 2L, 3L, 2L, 3L, 2L), .Label = c("#NULL!", "1", 
"2"), class = "factor"), Age = structure(c(5L, 3L, 2L, 2L, 3L, 
5L, 5L, 2L, 2L, 3L), .Label = c("#NULL!", "10", "11", "8", "9"
), class = "factor"), Rapport.1 = structure(c(22L, 25L, 19L, 
10L, 18L, 19L, 20L, 20L, 21L, 16L), .Label = c("#NULL!", "1.1", 
"1.85", "2.45", "2.5", "2.55", "2.6", "2.75", "2.8", "2.85", 
"2.9", "2.95", "3.2", "3.25", "3.3", "3.35", "3.4", "3.45", "3.5", 
"3.55", "3.6", "3.65", "3.7", "3.75", "3.8", "3.85", "3.9", "3.95"
), class = "factor"), Rapport.2 = structure(c(29L, 31L, 27L, 
17L, 9L, 26L, 24L, 21L, 30L, 32L), .Label = c("#NULL!", "1.25", 
"1.4", "1.6", "1.95", "2.05", "2.3", "2.35", "2.45", "2.5", "2.65", 
"2.7", "2.75", "2.8", "2.85", "3", "3.05", "3.1", "3.15", "3.2", 
"3.35", "3.4", "3.45", "3.5", "3.55", "3.6", "3.65", "3.7", "3.75", 
"3.8", "3.85", "3.9", "3.95", "4"), class = "factor"), Rapport.3 =     structure(c(32L, 
35L, 22L, 22L, 5L, 25L, 30L, 21L, 25L, 34L), .Label = c("#NULL!", 
"1.35", "1.45", "1.6", "1.75", "1.85", "1.9", "1.95", "2.05", 
"2.1", "2.25", "2.3", "2.35", "2.4", "2.45", "2.6", "2.75", "2.8", 
"2.9", "2.95", "3", "3.05", "3.1", "3.2", "3.25", "3.3", "3.35", 
"3.4", "3.45", "3.5", "3.55", "3.6", "3.7", "3.75", "3.8", "3.85"
), class = "factor"), STRS.1 = structure(c(33L, 10L, 8L, 18L, 
29L, 22L, 7L, 28L, 37L, 26L), .Label = c("#NULL!", "100", "102", 
"103", "104", "106", "107", "108", "109", "110", "111", "112", 
"113", "114", "115", "116", "117", "118", "119", "120", "122", 
"123", "124", "125", "126", "127", "128", "129", "132", "133", 
"69", "71", "73", "85", "88", "89", "92", "97", "99"), class = "factor"), 
STRS.2 = structure(c(37L, 19L, 9L, 22L, 21L, 22L, 16L, 16L, 
42L, 31L), .Label = c("#NULL!", "100", "101", "103", "104", 
"105", "106", "107", "108", "110", "111", "113", "114", "115", 
"116", "117", "118", "119", "120", "121", "122", "123", "124", 
"125", "126", "127", "128", "129", "131", "132", "136", "137", 
"138", "139", "158", "63", "76", "80", "91", "94", "95", 
"98", "99"), class = "factor"), STRS.3 = structure(c(31L, 
11L, 19L, 23L, 22L, 13L, 17L, 17L, 34L, 29L), .Label = c("#NULL!", 
"102", "104", "105", "106", "107", "108", "109", "110", "111", 
"112", "114", "117", "118", "119", "120", "122", "123", "124", 
"125", "126", "127", "128", "129", "130", "131", "132", "133", 
"134", "135", "66", "70", "75", "81", "85", "87", "88", "94", 
"98"), class = "factor")), .Names = c("SubID", "Gender", 
"Age", "Rapport.1", "Rapport.2", "Rapport.3", "STRS.1", "STRS.2", 
"STRS.3"), row.names = c(NA, 10L), class = "data.frame")

我尝试在 reshape 中使用“melt”函数，在 tidyr 中使用“gather”函数，但两者都生成一列，其中变量名称为“Rapport”和“STRS”堆叠，另一列包含它们的值。我无法弄清楚如何为“Rapport”值生成单列，为“STRS”值生成另一列，以便我可以使用随机效应模型（注意：我省略了其他人口统计变量和协变量）。对于这两个功能的任何帮助将不胜感激。

teachermelt <- melt(TeacherW,
id.vars=c("SubID", "Gender","Age"), 
measure.vars=c("Rapport.1", "Rapport.2", "Rapport.3", "STRS.1","STRS.2","STRS.3" ),
variable.name="Rapport","STRS",
value.name="Rapport","STRS)

teachertidy <- gather(TeacherW, Rapport, STRS, Rapport.1:STRS.3)

我终于能够使用这个“重塑”功能获得长格式，这看起来很简单，但我不确定这样做时是否需要注意什么：

Teacherl<-reshape(TeacherW, varying = 4:9, sep = ".", idvar="SubID", direction = 'long')
View(Teacherl)

【问题讨论】：

标签： r tidyr melt

【解决方案1】：

我很难确定这是否是您想要的，但这是一个

以df开头：

  SubID Gender Age Rapport.1 Rapport.2 Rapport.3 STRS.1 STRS.2 STRS.3
1     1      2   9      3.65      3.75       3.6     73     76     66
2     2      1  11       3.8      3.85       3.8    110    120    112
3     3      2  10       3.5      3.65      3.05    108    108    124
4     4      1  10      2.85      3.05      3.05    118    123    128
5     5      2  11      3.45      2.45      1.75    132    122    127

`Tidyr`sol'n：

library(dplyr)
library(tidyr)

df %>%
unite(one,contains("1")) %>%  # unite all columns that contain '1' with default sep = "_" into single new column named "one"
unite(two, contains("2")) %>% 
unite(three, contains("3")) %>% 
gather(replicate,values,one:three) %>%   # gather all columns between that named "one" and that named "three" (inclusive) into two new columns: a key column (named "replicate") and a value column (named "values")
separate(values,c("Rapport","STRS"),sep = "_") # separate the column named "values" into two new columns named "Rapport" and "STRS" according to the separator "_".

给出：

   SubID Gender Age replicate Rapport STRS
1      1      2   9       one    3.65   73
2      2      1  11       one     3.8  110
3      3      2  10       one     3.5  108
4      4      1  10       one    2.85  118
5      5      2  11       one    3.45  132
6      1      2   9       two    3.75   76
7      2      1  11       two    3.85  120
8      3      2  10       two    3.65  108
9      4      1  10       two    3.05  123
10     5      2  11       two    2.45  122
11     1      2   9     three     3.6   66
12     2      1  11     three     3.8  112
13     3      2  10     three    3.05  124
14     4      1  10     three    3.05  128
15     5      2  11     three    1.75  127

解释：

您要求的（我认为）是收集 Rapport 和 STRS 列，但根据他们的提名链接（.1,.2,.3） .整理一下：

unite() 将链接的变量合并为一列（形成变量one、two、three）。之后你可以
gather() 这些列根据键值对（此处为 replicate 和 values）。最后，
separate() 将 values 变量返回到其组成变量 Rapport 和 STRS。

注意

我认为这里适当的“整洁”数据结构是：（为了安全起见）

df %>%
  gather(key, value, -SubID,-Gender,-Age) %>% 
  separate(key, into = c("var","idx"), sep="\\.")

【讨论】：

谢谢，但这是我所能得到的（也许我自己的例子没有那么远，因为我已经对它们进行了太多修改，这些可能会有额外的错误）。我希望能够为“Rapport”和“STRS”变量获得两列值，这样单个主题将位于三行，其中“Rapport”的度量和“STRS”的相应度量同一行。感谢您的回复。
@user3594490 你能在你的问题中发布预期的结果吗
我正在重新发布这个问题，其中包含所需输出的样本和较小的案例数，以便更容易理解。与“2”相同的标题。不知道这样行不行。
转发了“Reshape vs. tidyr for repeatmeasures with multiple因变量（2）”，但由于大家的反应，没有删除这个。
@user3594490 我有机会重新阅读了这些问题并重新发布了我认为应该有帮助的答案。

Tidyrsol'n：

解释：

注意

`Tidyr`sol'n：