【发布时间】:2018-03-30 20:29:53
【问题描述】:
我正在尝试以“可读”方式格式化我的数据,其中我有多个具有相同名称的列。我尝试使用melt()函数,但未能解决问题,这似乎与变量上存在不同值有关。
数据的一个小例子:
obs m ti td date class code dis group status grade freq date dis group status grade freq date dis group status grade freq date
obs_1 A grad 05/01/2016 00:00 55060 DDE0300 2016101 A 5.7 97 05/01/2016 15:20 MS0230 2016101 A 8.19 100 05/01/2016 15:20 A0301 2016101 A 5.8 100 27/01/2016 13:12
obs_2 A grad 05/01/2016 00:00 55070 SSE332 0 D 03/06/2016 14:08 A0804 0 D 03/06/2016 14:18 SE089 0 D 26/08/2016 19:31
现在我想通过观察来分割这个数据框:
melt(df[1,],id.vars=c("obs","m","ti","td","date","class","code"),
measure.vars=c("dis","group","status","grade","freq","date"))
我明白了:
obs m ti td date class code variable value
1 obs_1 A grad NA 05/01/2016 15:20 NA 55060 dis DDE0300
2 obs_1 A grad NA 05/01/2016 15:20 NA 55060 group 2016101
3 obs_1 A grad NA 05/01/2016 15:20 NA 55060 status A
4 obs_1 A grad NA 05/01/2016 15:20 NA 55060 grade 5.7
5 obs_1 A grad NA 05/01/2016 15:20 NA 55060 freq 97
6 obs_1 A grad NA 05/01/2016 15:20 NA 55060 date 05/01/2016 15:20
Warning message:
attributes are not identical across measure variables; they will be dropped
现在,我“缺少”两列,分别是 MS0230 和 A0301 以及它们的组、状态等。我该如何解决这个问题?
请记住,它不一定要使用 melt() 函数。
重现数据的代码:
df<-structure(list(obs = structure(1:2, .Label = c("obs_1", "obs_2"
), class = "factor"), m = structure(c(1L, 1L), .Label = "A ", class = "factor"),
ti = structure(c(1L, 1L), .Label = "grad", class = "factor"),
td = c(NA, NA), datei = structure(c(1L, 1L), .Label = "05/01/2016 00:00", class = "factor"),
class = c(NA, NA), code = c(55060L, 55070L), dis = structure(1:2, .Label = c("DDE0300",
"SSE332"), class = "factor"), group = c(2016101L, 0L), status = structure(1:2, .Label = c("A ",
"D "), class = "factor"), grade = c(5.7, NA), freq = c(97L,
NA), date = structure(c(2L, 1L), .Label = c("03/06/2016 14:08",
"05/01/2016 15:20"), class = "factor"), dis = structure(c(2L,
1L), .Label = c("A0804", "MS0230"), class = "factor"), group = c(2016101L,
0L), status = structure(1:2, .Label = c("A ", "D "), class = "factor"),
grade = c(8.19, NA), freq = c(100L, NA), date = structure(c(2L,
1L), .Label = c("03/06/2016 14:18", "05/01/2016 15:20"), class = "factor"),
dis = structure(1:2, .Label = c("A0301", "SE089"), class = "factor"),
group = c(2016101L, 0L), status = structure(1:2, .Label = c("A ",
"D "), class = "factor"), grade = c(5.8, NA), freq = c(100L,
NA), date = structure(c(2L, 1L), .Label = c("26/08/2016 19:31",
"27/01/2016 13:12"), class = "factor")), .Names = c("obs",
"m", "ti", "td", "datei", "class", "code", "dis", "group", "status",
"grade", "freq", "date", "dis", "group", "status", "grade", "freq",
"date", "dis", "group", "status", "grade", "freq", "date"), class = "data.frame", row.names = c(NA,
-2L))
【问题讨论】:
-
似乎是 Reshaping multiple sets of measurement columns (wide format) into single columns (long format) 的副本。尝试例如
reshape(df, idvar = "obs", direction = "long", varying = list(dis = c(8, 14, 20), group = c(9, 15, 21), status = c(10, 16, 22), grade = c(11, 17, 23), freq = c(12, 18, 24), date = c(13, 19, 25))) -
请显示想要的结果。您很明显 MS0230 和 A0301 应该是
melt之后的列。
标签: r dataframe data-manipulation