【问题标题】:How to transform panel data into specific wide format?如何将面板数据转换为特定的宽格式?
【发布时间】:2018-12-20 22:16:14
【问题描述】:

我有一个像这样的长格式数据框

> df2
   id t treat    value
1   1 1     0 5.718226
2   1 2     1 5.954669
3   1 3     0 4.485165
4   2 1     1 6.616181
5   2 2     0 4.521301
6   2 3     1 6.955451
7   3 1     0 3.851682
8   3 2     1 6.907178
9   3 3     1 6.274501
10  4 1     1 6.092860
11  4 2     0 4.327431
12  4 3     0 6.019627

我想要宽格式的输出:

  id treat.1  value.1 treat.2  value.2 treat.3  value.3
1  1       0 5.718226       1 5.954669       0 4.485165
2  2       1 6.616181       0 4.521301       1 6.955451
3  3       0 3.851682       1 6.907178       1 6.274501
4  4       1 6.092860       0 4.327431       0 6.019627

我只能这样:

l2.2 <- with(df2, split(df2, t))
l2.3 <- lapply(seq_along(l2.2), function(x) {
  names(l2.2[[x]])[3:4] <- paste0(names(l2.2[[x]])[3:4], ".", names(l2.2)[x])
  l2.2[[x]][-2]
})

Reduce(function(x, y) 
  merge(x, y, all=TRUE, by=intersect(names(x), names(y))), l2.3)

这对我来说似乎过于复杂,但我无法简化它。我想了解如何使用reshape()aggregate()data.table::dcast() 执行此操作。我接近了,但我无法弄清楚:

重塑:

df2.1 <- reshape(df2, timevar="t", idvar=c("id", "treat"), direction="wide")
df2.2 <- reshape(df2.1, timevar="treat", idvar="id", direction="wide")
> df2.2[, c(1, 2, 5, 3, 6, 4, 7)]
   id value.1.0 value.1.1 value.2.0 value.2.1 value.3.0 value.3.1
1   1  5.718226        NA        NA  5.954669  4.485165        NA
4   2        NA  6.616181  4.521301        NA        NA  6.955451
7   3  3.851682        NA        NA  6.907178        NA  6.274501
10  4        NA  6.092860  4.327431        NA  6.019627        NA

data.table:

> data.table::dcast(df2, id + treat ~ t)
  id treat        1        2        3
1  1     0 5.718226       NA 4.485165
2  1     1       NA 5.954669       NA
3  2     0       NA 4.521301       NA
4  2     1 6.616181       NA 6.955451
5  3     0 3.851682       NA       NA
6  3     1       NA 6.907178 6.274501
7  4     0       NA 4.327431 6.019627
8  4     1 6.092860       NA       NA

我的一些aggregate() 尝试也失败了。

谁能告诉我如何在base R 和data.table::dcast() 中做到这一点?

编辑:

与这个简单的示例相比,我在数据中有一些其他的 id 和时变变量,请参阅下面数据中的 df3

虽然@markus 的reshape() 解决方案有效,但dcast() 会引发错误:

> reshape(df3, timevar="t", idvar=c("id.1", "id.2"), direction="wide")
   id.1 id.2 treat.1 value.1.1  value.2.1 treat.2 value.1.2  value.2.2 treat.3 value.1.3  value.2.3
1     1    1       0  5.718226  0.4297986       1  5.954669 -1.4007124       0  4.485165 -1.1741134
4     2    1       1  6.616181 -1.0516253       0  4.521301  1.5686463       1  6.955451 -0.4306961
7     3    2       0  3.851682 -0.3046341       1  6.907178 -0.6669521       1  6.274501 -0.2582921
10    4    2       1  6.092860  1.3231503       0  4.327431  1.6899552       0  6.019627 -0.4263450

> dcast(df3, id ~ t, value.var = c("treat", "value"))
Error in .subset2(x, i, exact = exact) : subscript out of bounds
In addition: Warning message:
In if (!(value.var %in% names(data))) { :
  the condition has length > 1 and only the first element will be used

数据:

df2 <- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L),
                      t = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3),
                      treat = c(0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0),
                      value = c(5.7182263351077, 5.95466888906233,
                                4.48516458093838, 6.61618146498587,
                                4.52130082895974, 6.95545080306353,
                                3.85168235272874, 6.90717809069993,
                                6.27450118041287, 6.09285968998526,
                                4.32743136605772, 6.01962658742754)),
                 row.names = c(NA, -12L), class = "data.frame")

df3 <- structure(list(id.1 = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L), 
                      id.2 = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), 
                      t = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3), 
                      treat = c(0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0), 
                      value.1 = c(5.7182263351077, 5.95466888906233, 4.48516458093838, 
                                  6.61618146498587, 4.52130082895974, 6.95545080306353, 
                                  3.85168235272874, 6.90717809069993, 6.27450118041287, 
                                  6.09285968998526, 4.32743136605772, 6.01962658742754), 
                      value.2 = c(0.429798618225556, -1.40071244216677, -1.17411337999421, 
                                  -1.05162530600877, 1.56864628901013, -0.430696055983823, 
                                  -0.304634116536374, -0.666952117966313, -0.258292124936937, 
                                  1.32315028276158, 1.68995518578212, -0.426345031174389)), 
                 class = "data.frame", row.names = c(NA, -12L))

【问题讨论】:

    标签: r data.table reshape


    【解决方案1】:

    使用来自data.tabledcast,您需要将列treatvalue 指定为value.vars。

    library(data.table)
    setDT(df2)
    dcast(df2, id ~ t, value.var = c("treat", "value"))
    #   id treat_1 treat_2 treat_3  value_1  value_2  value_3
    #1:  1       0       1       0 5.718226 5.954669 4.485165
    #2:  2       1       0       1 6.616181 4.521301 6.955451
    #3:  3       0       1       1 3.851682 6.907178 6.274501
    #4:  4       1       0       0 6.092860 4.327431 6.019627
    

    reshape 会这样工作

    reshape(df2, idvar = "id", timevar = "t", direction = "wide")
    #   id treat.1  value.1 treat.2  value.2 treat.3  value.3
    #1   1       0 5.718226       1 5.954669       0 4.485165
    #4   2       1 6.616181       0 4.521301       1 6.955451
    #7   3       0 3.851682       1 6.907178       1 6.274501
    #10  4       1 6.092860       0 4.327431       0 6.019627
    

    【讨论】:

    • 我把我的例子简化了一点,实际上我的数据中有其他值和 id 变量。无论如何,您的 reshape() 解决方案扩展到多个带有 c(.) 的 id 变量就像一个魅力!但是,data.table 解决方案还不起作用。
    • @jay.sf 你能用另一个数据样本更新你的问题吗? reshape 太强大了,我希望我知道如何正确使用它的所有参数..
    • 确实如此!谢谢,请查看我的编辑以及其他数据。
    • 这是否给了你想要的输出:setDT(df3); dcast(df3, id.1 + id.2 ~ t, value.var = c("treat", "value.1", "value.2")) ?
    猜你喜欢
    • 2021-11-16
    • 1970-01-01
    • 2019-08-14
    • 1970-01-01
    • 1970-01-01
    • 2012-02-18
    • 1970-01-01
    • 1970-01-01
    • 2021-07-01
    相关资源
    最近更新 更多