【发布时间】:2017-07-06 23:47:30
【问题描述】:
我有一个如下所示的数据框:
df<- data.frame(region= c("1","1","1","1","1","2","2","2","2","2","2"),loc=c("104","104","104","105","106","107","108", "109", "110", "110", "111"), interact= c("A_B", "B_C", "A_B", "B_C", "B_C", "A_B", "G_H", "I_J", "J_K", "L_M", "M_O"))
我想将它从长格式更改为宽格式,以便区域编号成为变量(列标题),并且行成为该区域中出现的interact 级别的计数。需要注意的是,我还希望第一行是该地区唯一 loc 级别的计数。我将首先说明中间df:
df2<- data.frame(interact= c("", "A_B", "B_C", "G_H", "I_J", "J_K",
"L_M", "M_O"), region1= c("3", "2", "3", "0","0","0","0","0"),
region2= c("5", "1", "0", "1","1","1","1","1"))
您会注意到loc 中有 3 个独特的关卡用于区域 1,loc 中有 5 个独特的关卡用于区域2;因此第一行数字表示该区域中唯一的loc 计数。接下来的所有行表示该区域中所有 loc 中每种交互类型的频率。但是,我不希望在最终数据框中出现此 interact 列,因此最终输出应如下所示:
output<- data.frame(region1= c("3", "2", "3", "0","0","0","0","0"),
region2= c("5", "1", "0", "1","1","1","1","1"))
我尝试了以下方法,但我无法在每个区域中添加一个包含唯一 loc 计数的行,我知道我当前的步骤不是最有效的方法:
library(tidyr)
df<- df %>%
group_by(region, interact) %>%
summarise(freq = n())
data_wide <- spread(df, region, freq)
data_wide<- data_wide[,-1]
【问题讨论】:
标签: r