【问题标题】:How to use dplyr to gather multiple instances of an event and create a tidy tibble [duplicate]如何使用 dplyr 收集事件的多个实例并创建一个整洁的 tibble [重复]
【发布时间】:2017-08-09 08:27:23
【问题描述】:

我有一个类似这样的数据集:

library(tidyverse)

df <- tibble(
  subjid = 1:5,
  event_1 = c("Watery eyes",         # Event number 1 
          "Sore throat",
          "Vomiting",
          "Gastroenteritis viral",
          "Dry Mouth"),
  start_date_1 = as.Date("2017-01-02") + 0:4,
  stop_date_1 = as.Date("2017-01-03") + 0:4,
  severity_1 = 1,
  related_to_drug_1 = 0,
  event_2 = c("Nausea",             # Event number 2
          "Dizziness",
          "Cough",
          "Disorientation",
          "Diarrhea"),
  start_date_2 = as.Date("2017-02-02") + 0:4,
  stop_date_2 = as.Date("2017-02-03") + 0:4,
  severity_2 = 2,
  related_to_drug_2 = 1,
  event_3 = c("Eczema",             # Event number 3
          "Sinusitis",
          "Abdominal discomfort",
          "Muscle spasms",
          "Nasopharyngitis"),
  start_date_3 = as.Date("2017-03-02") + 0:4,
  stop_date_3 = as.Date("2017-03-03") + 0:4,
  severity_3 = 2,
  related_to_drug_3 = 1
)
df

# A tibble: 5 × 16
  subjid               event_1 start_date_1 stop_date_1 severity_1 related_to_drug_1        event_2 start_date_2 stop_date_2 severity_2 related_to_drug_2              event_3
   <int>                 <chr>       <date>      <date>      <dbl>             <dbl>          <chr>       <date>      <date>      <dbl>             <dbl>                <chr>
1      1           Watery eyes   2017-01-02  2017-01-03          1                 0         Nausea   2017-02-02  2017-02-03          2                 1               Eczema
2      2           Sore throat   2017-01-03  2017-01-04          1                 0      Dizziness   2017-02-03  2017-02-04          2                 1            Sinusitis
3      3              Vomiting   2017-01-04  2017-01-05          1                 0          Cough   2017-02-04  2017-02-05          2                 1 Abdominal discomfort
4      4 Gastroenteritis viral   2017-01-05  2017-01-06          1                 0 Disorientation   2017-02-05  2017-02-06          2                 1        Muscle spasms
5      5             Dry Mouth   2017-01-06  2017-01-07          1                 0       Diarrhea   2017-02-06  2017-02-07          2                 1      Nasopharyngitis
# ... with 4 more variables: start_date_3 <date>, stop_date_3 <date>, severity_3 <dbl>, related_to_drug_3 <dbl>

但是,还有更多的数据行和超过 100 个“事件”/列系列。数据框由每个受试者的一行组成,其中包含不良事件及其相关属性,列在用下划线命名的列中,以指示它们属于哪个事件。我想使用 tidyr 将这些事件收集到一个像这样的小标题中:

# A tibble: 15 × 7
   subjid event_number                 event start_date  stop_date severity related_to_drug
    <int>        <int>                 <chr>     <date>     <date>    <int>                <int>
1       1            1           Watery eyes 2017-01-02 2017-01-03        1                    0
2       2            1           Sore throat 2017-01-03 2017-01-04        1                    0
3       3            1              Vomiting 2017-01-04 2017-01-05        1                    0
4       4            1 Gastroenteritis viral 2017-01-05 2017-01-06        1                    0
5       5            1             Dry Mouth 2017-01-06 2017-01-07        1                    0
6       1            2                Nausea 2017-02-02 2017-02-03        2                    1
7       2            2             Dizziness 2017-02-03 2017-02-04        2                    1
8       3            2                 Cough 2017-02-04 2017-02-05        2                    1
9       4            2        Disorientation 2017-02-05 2017-02-06        2                    1
10      5            2              Diarrhea 2017-02-06 2017-02-07        2                    1
11      1            3                Eczema 2017-03-02 2017-03-03        3                    2
12      2            3             Sinusitis 2017-03-03 2017-03-04        3                    2
13      3            3  Abdominal discomfort 2017-03-04 2017-03-05        3                    2
14      4            3         Muscle spasms 2017-03-05 2017-03-06        3                    2
15      5            3       Nasopharyngitis 2017-03-06 2017-03-07        3                    2

每个不良事件都有一行,列标识该特定事件的属性。

【问题讨论】:

    标签: r dplyr tidyr tidyverse


    【解决方案1】:

    您可以使用以下代码执行此操作:

    df %>%
      gather(Var,Val,-1) %>%
      mutate(Var = gsub('_(\\d+)','!!\\1',Var)) %>% 
      separate(Var,c('Var','Event'),sep = '!!') %>%
      spread(Var,Val)
    

    不幸的是,这将破坏列的类,这需要修复,您可以通过调用 mutate 来解决。

    (另请注意,收集后的mutate 行只是因为您的列名中有“_”,我想拆分事件编号。)

    【讨论】:

    • 谢谢;这正是我所需要的!我补充说: %>% mutate(start_date = as_date(as.numeric(start_date))) %>% mutate(stop_date = as_date(as.numeric(stop_date))) 它就像我需要的那样工作!再次感谢!!!!
    • @jsly dplyr 提示:您可以通过一次调用 mutate 进行多项更改。例如:mutate(A = as_date(A), B=as_date(B))。通过适当的缩进,这可以不那么混乱。 (或者更糟)
    【解决方案2】:

    一种更复杂的方法,但非常重要的是,保留了类
    从列名开始,根据事件编号拆分,然后每个事件制作一个数据框,最后垂直堆叠:

    names(df) %>% 
      setdiff("subjid") %>% 
      split(sub(".*_(\\d+)$", "\\1", x = .)) %>% 
      map(~ select_(.data = df, .dots = c("subjid", .x))) %>% 
      map(~ setNames(.x, nm = sub("(.*)_\\d+$", "\\1", x = names(.x)))) %>%
      map2(names(.), ~ mutate(.x, event_number = .y)) %>% 
      bind_rows() %>% 
      select(subjid, event_number, everything())
    # # A tibble: 15 × 7
    #    subjid event_number                 event start_date  stop_date severity related_to_drug
    #     <int>        <chr>                 <chr>     <date>     <date>    <dbl>           <dbl>
    # 1       1            1           Watery eyes 2017-01-02 2017-01-03        1               0
    # 2       2            1           Sore throat 2017-01-03 2017-01-04        1               0
    # 3       3            1              Vomiting 2017-01-04 2017-01-05        1               0
    # 4       4            1 Gastroenteritis viral 2017-01-05 2017-01-06        1               0
    # 5       5            1             Dry Mouth 2017-01-06 2017-01-07        1               0
    # 6       1            2                Nausea 2017-02-02 2017-02-03        2               1
    # 7       2            2             Dizziness 2017-02-03 2017-02-04        2               1
    # 8       3            2                 Cough 2017-02-04 2017-02-05        2               1
    # 9       4            2        Disorientation 2017-02-05 2017-02-06        2               1
    # 10      5            2              Diarrhea 2017-02-06 2017-02-07        2               1
    # 11      1            3                Eczema 2017-03-02 2017-03-03        2               1
    # 12      2            3             Sinusitis 2017-03-03 2017-03-04        2               1
    # 13      3            3  Abdominal discomfort 2017-03-04 2017-03-05        2               1
    # 14      4            3         Muscle spasms 2017-03-05 2017-03-06        2               1
    # 15      5            3       Nasopharyngitis 2017-03-06 2017-03-07        2               1
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2018-05-18
      • 2015-09-22
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-03-26
      相关资源
      最近更新 更多