【问题标题】:Reshaping wide to tall data over multiple variables [duplicate]在多个变量上重塑宽数据到高数据[重复]
【发布时间】:2020-01-02 05:14:59
【问题描述】:

目前我的数据如下所示:

wide.df <- read.table(header = T, sep = ",", text = "
ID, left.mid.brain, right.mid.brain, left.lat.brain, right.lat.brain, score, group
100, 18 , 4, 29, 30, 40, 0
101, 19,  7, 33, 40, 29, 0
103, 19, 19, 22, 30, 33, 0
200, 29, 30, 22, 33, 11, 1
233, 100, 33, 22, 44, 55, 1")

我需要将我的数据转换为长格式,如下所示:

ID  group  left.or.right  mid.or.lat    brain     score
100   0          0             0           29        40   # 0 = left, 0=lat 
100   0          1             0           30        40   # 1 = right, 0=lat
100   0          0             1           18        40   # 0 = left, 1 = mid
100   0          1             1            4        40   # 1 = right, 1 = mid
101   0          0             0           33        29   # 0 = left, 0 = lat
.
.
.
.
.
233   1           1            1            33        55   # 1= right, 1= mid

其中left.mid.brainright.mid.brainleft.lat.brainright.lat.brain 更改为因子,但仍保留其值,每个参与者各有四行。

【问题讨论】:

    标签: r reshape


    【解决方案1】:

    tidyverse(特别是 dplyrtidyr 包)非常擅长这样的操作:

    library(tidyverse)
    
    long.df <- wide.df %>% 
      gather(variable, brain, left.mid.brain, right.mid.brain, left.lat.brain, right.lat.brain) %>% 
      mutate(
        left.or.right = ifelse(grepl('left', variable), 0, 1),
        mid.or.lat = ifelse(grepl('lat', variable), 0, 1)
      ) %>% 
      select(ID, group, left.or.right, mid.or.lat, brain, score) %>% 
      arrange(ID)
    
        ID group left.or.right mid.or.lat brain score
    1  100     0             0          1    18    40
    2  100     0             1          1     4    40
    3  100     0             0          0    29    40
    4  100     0             1          0    30    40
    5  101     0             0          1    19    29
    6  101     0             1          1     7    29
    7  101     0             0          0    33    29
    8  101     0             1          0    40    29
    9  103     0             0          1    19    33
    10 103     0             1          1    19    33
    

    【讨论】:

    • 由于grepl返回逻辑值,您可以将它们转换为数字而不是在ifelse中手动分配:left.or.right = as.numeric(grepl("left", variable))
    【解决方案2】:

    另一个基于dplyr/tidyr 的方法应该可以很好地扩展。创建长形数据后,您将拥有像 "right.mid.brain" 这样的值的列,您希望将这些列拆分为 "right""mid"dplyr::separate 很容易做到这一点,在 "\\." 上拆分并避免过多的硬编码.它会用一个虚拟柱子粘住你,我稍后会删除它。

    到那时,你会得到这个:

    library(dplyr)
    library(tidyr)
    
    # 0 = left, 0 = lat 
    wide %>%
      gather(key, value = brain, -ID, -score, -group) %>%
      separate(key, into = c("left.or.right", "mid.or.lat", "dummy"), sep = "\\.") %>%
      head()
    #>    ID score group left.or.right mid.or.lat dummy brain
    #> 1 100    40     0          left        mid brain    18
    #> 2 101    29     0          left        mid brain    19
    #> 3 103    33     0          left        mid brain    19
    #> 4 200    11     1          left        mid brain    29
    #> 5 233    55     1          left        mid brain   100
    #> 6 100    40     0         right        mid brain     4
    

    如果您需要进行更复杂的重新编码,您可以使用一些forcats 函数来重新编码因子水平。在这种情况下,只需根据left.or.right == "right" 之类的条件转换列就足够简单了,如果为 true,则为 1,如果为 false(即,如果为左侧),则为 0。按您想要的顺序选择列。

    long <- wide %>%
      gather(key, value = brain, -ID, -score, -group) %>%
      separate(key, into = c("left.or.right", "mid.or.lat", "dummy"), sep = "\\.") %>%
      mutate(left.or.right = as.numeric(left.or.right == "right"),
             mid.or.lat = as.numeric(mid.or.lat == "mid")) %>%
      select(ID, group, left.or.right, mid.or.lat, brain, score) %>%
      arrange(ID)
    
    head(long)
    #>    ID group left.or.right mid.or.lat brain score
    #> 1 100     0             0          1    18    40
    #> 2 100     0             1          1     4    40
    #> 3 100     0             0          0    29    40
    #> 4 100     0             1          0    30    40
    #> 5 101     0             0          1    19    29
    #> 6 101     0             1          1     7    29
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2018-02-17
      • 2020-07-12
      • 1970-01-01
      • 2017-10-13
      • 2011-01-16
      • 2012-12-01
      • 1970-01-01
      相关资源
      最近更新 更多