【问题标题】:Convert from long to wide format counting frequency of eliminated factor level (Prepping dataframe for input into iNEXT Online)将消除因子水平的长格式计数频率转换为宽格式计数频率(准备数据帧以输入 iNEXT Online)
【发布时间】:2017-07-06 23:47:30
【问题描述】:

我有一个如下所示的数据框:

df<- data.frame(region= c("1","1","1","1","1","2","2","2","2","2","2"),loc=c("104","104","104","105","106","107","108", "109", "110", "110", "111"), interact= c("A_B", "B_C", "A_B", "B_C", "B_C", "A_B", "G_H", "I_J", "J_K", "L_M", "M_O"))

我想将它从长格式更改为宽格式,以便区域编号成为变量(列标题),并且行成为该区域中出现的interact 级别的计数。需要注意的是,我还希望第一行是该地区唯一 loc 级别的计数。我将首先说明中间df:

df2<- data.frame(interact= c("", "A_B", "B_C", "G_H", "I_J", "J_K", 
 "L_M", "M_O"), region1= c("3", "2", "3", "0","0","0","0","0"), 
 region2= c("5", "1", "0", "1","1","1","1","1"))

您会注意到loc 中有 3 个独特的关卡用于区域 1,loc 中有 5 个独特的关卡用于区域2;因此第一行数字表示该区域中唯一的loc 计数。接下来的所有行表示该区域中所有 loc 中每种交互类型的频率。但是,我不希望在最终数据框中出现此 interact 列,因此最终输出应如下所示:

output<- data.frame(region1= c("3", "2", "3", "0","0","0","0","0"), 
region2= c("5", "1", "0", "1","1","1","1","1"))

我尝试了以下方法,但我无法在每个区域中添加一个包含唯一 loc 计数的行,我知道我当前的步骤不是最有效的方法:

library(tidyr)
df<- df %>% 
group_by(region, interact) %>% 
summarise(freq = n()) 
data_wide <- spread(df, region, freq)
data_wide<- data_wide[,-1]

【问题讨论】:

    标签: r


    【解决方案1】:

    我们可以使用data.table分两步完成此操作

    library(data.table)
    d1 <- dcast(setDT(df)[, .(interact = "", uniqueN(loc)), region], 
             interact ~ paste0('region', region))
    rbind(d1, dcast(df, interact ~ paste0('region', region), length))
    #   interact region1 region2
    #1:                3       5
    #2:      A_B       2       1
    #3:      B_C       3       0
    #4:      G_H       0       1
    #5:      I_J       0       1
    #6:      J_K       0       1
    #7:      L_M       0       1
    #8:      M_O       0       1
    

    或使用tidyverse

    library(tidyverse)
    bind_rows(df %>%
                group_by(region = paste0('region', region)) %>% 
                summarise(interact = "", V1 = n_distinct(loc)) %>% 
                spread(region, V1),
              df %>% 
                group_by(region = paste0('region', region),
                        interact = as.character(interact)) %>%
                summarise(V1 = n()) %>% 
                spread(region, V1, fill = 0))
    # A tibble: 8 x 3
    #  interact region1 region2
    #     <chr>   <dbl>   <dbl>
    #1                3       5
    #2      A_B       2       1
    #3      B_C       3       0
    #4      G_H       0       1
    #5      I_J       0       1
    #6      J_K       0       1
    #7      L_M       0       1
    #8      M_O       0       1
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-04-24
      • 1970-01-01
      • 2019-10-17
      • 1970-01-01
      • 2019-08-14
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多