在 R 中将长格式面板数据转换为宽格式答案

【问题标题】：Transforming long form panel data to wide form in R在 R 中将长格式面板数据转换为宽格式
【发布时间】：2021-11-16 01:13:46
【问题描述】：

我有以下长格式数据，我想使用 R 将其转换为宽格式：

structure(list(survey_unique_id = c(2816790L, 2816790L, 2816790L, 
2585861L, 2585861L, 214733L, 214733L, 214733L, 224481L, 224481L, 
224481L), user_id = c(623333L, 623333L, 623333L, 623333L, 623333L, 
700200L, 700200L, 700200L, 700200L, 700200L, 700200L), 
survey_completion_date = c("3/3/2021 16:39", "3/3/2021 16:39", 
"3/3/2021 16:39", "1/29/2021 22:14", "1/29/2021 22:14", "11/27/2017 
19:02", "11/27/2017 19:02", "11/27/2017 19:02", "12/19/2017 21:02", 
"12/19/2017 21:02", "12/19/2017 21:02"), survey_id = c(1L, 1L, 
1L, 4L, 4L, 1L, 1L, 1L, 9L, 9L, 9L), question_id = c(1L, 2L, 
3L, 6L, 7L, 1L, 2L, 3L, 19L, 20L, 21L), question_score = c(7L, 
7L, 9L, 13L, 5L, 18L, 12L, 15L, 11L, 12L, 12L)), class = 
"data.frame", row.names = c(NA, -11L))

Original long format mock data

目前，每一行都是参与者对一个问题的回答，其中一个问题可能是多个问题。理想情况下，我希望每一行都成为参与者并且看起来像这样：

structure(list(ï..user_id = c(623333L, 700200L), survey_1_question_1_score = c(7L, 
18L), survey_1_question_2_score = c(7L, 12L), survey_1_question_3_score = c(9L, 
15L), survey_4_question_6_score = c(13L, NA), survey_4_question_7_score = c(5L, 
NA), survey_9_question_19_score = c(NA, 11L), survey_9_question_20_score = c(NA, 
12L), survey_9_question_21_score = c(NA, 12L)), class = "data.frame", row.names = c(NA, 
-2L))

Ideal wide format mock data

这里的一个问题是，原始数据只有调查完成日期，但不表示每个参与者已经参加了多少给定调查，所以我想我必须在像这样的数据在转置之前。我不确定如何在 R 中创建这个新列（如果这是正确的下一步），或者如何将数据转换为宽格式。我无法使用 excel，因为文件太大。这里最简单的方法是什么？

编辑：感谢大家提供使用 dput() 的提示，并为第一次提出问题时没有做得更好而深表歉意。这是我第一次在 Stack Overflow 上提问！

【问题讨论】：

欢迎。请使用dput() 分享您的数据，不要使用图片分享您的数据。谢谢。
请创建一个可重现的示例，如here 解释的那样。不要粘贴数据图片。

标签： r time-series transpose panel-data

【解决方案1】：

library(tidyverse)
data <-
  tribble(
  ~participant, ~ question, ~ score,
  1, 1, 1,
  1, 2, 0,
  1, 3, 0,
  1, 1, 1,
  1, 2, 0,
  1, 3, 1,
  2, 1, 0,
  2, 2, 0,
  2, 3, 0,
  2, 1, 0,
  2, 2, 0,
  2, 3, 1,
)
data
#> # A tibble: 12 x 3
#>    participant question score
#>          <dbl>    <dbl> <dbl>
#>  1           1        1     1
#>  2           1        2     0
#>  3           1        3     0
#>  4           1        1     1
#>  5           1        2     0
#>  6           1        3     1
#>  7           2        1     0
#>  8           2        2     0
#>  9           2        3     0
#> 10           2        1     0
#> 11           2        2     0
#> 12           2        3     1

data %>%
  # add survey column assuming there is one combo of participant and question for each survey
  group_by(participant, question) %>%
  mutate(survey = row_number()) %>%
  
  # create grouping column
  unite(group, c(survey, question)) %>%
  pivot_wider(names_from = group, values_from = score)
#> # A tibble: 2 x 7
#> # Groups:   participant [2]
#>   participant `1_1` `1_2` `1_3` `2_1` `2_2` `2_3`
#>         <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1           1     1     0     0     1     0     1
#> 2           2     0     0     0     0     0     1

^{由reprex package (v2.0.1) 于 2021-09-22 创建}

【讨论】：