【问题标题】:Reshape data in R change a long table into a wide table在R中重塑数据将长表更改为宽表
【发布时间】:2015-06-26 02:37:10
【问题描述】:

我想使用 R 中的 reshape2 包将我的长桌变成宽桌。

我有一个来自数据库的数据集,如下所示(示例):

id1   |  id2 |  info  | action_time |
 1    | a    |  info1 |    time1    |
 1    | a    |  info1 |    time2    |  
 1    | a    |  info1 |    time3    |  
 2    | b    |  info2 |    time4    |
 2    | b    |  info2 |    time5    |

现在我希望它是这样的:

id1   |  id2 |  info  |action_time 1|action_time 2|action_time 3|
 1    | a    |  info1 |    time1    |    time2    |    time3    |
 2    | b    |  info2 |    time4    |    time5    |             | 

我已经尝试了几次,并使用reshape()dcast() 在某些网站上查找了一些示例,但找不到这样的示例。每个 id 的 action_time 的数量是不同的,对于某些 id,它们可能有超过 10 个 action_times,所以在这种情况下,重构的数据集将有超过 10 列的 action_time

任何人都可以想到一种方便的方法吗?如果有办法在 excel(数据透视表?)中做到这一点,那也很棒。谢谢大家

【问题讨论】:

标签: r pivot-table reshape2


【解决方案1】:

试试:

library(dplyr)
library(tidyr)

df %>% 
  group_by(id1) %>% 
  mutate(action_no = paste("action_time", row_number())) %>%
  spread(action_no, action_time)

这给出了:

#Source: local data frame [2 x 6]
#
#  id1 id2  info action_time 1 action_time 2 action_time 3
#1   1   a info1         time1         time2         time3
#2   2   b info2         time4         time5            NA

数据

df <- structure(list(id1 = c(1, 1, 1, 2, 2), id2 = structure(c(1L, 
1L, 1L, 2L, 2L), .Label = c("a", "b"), class = "factor"), info = structure(c(1L, 
1L, 1L, 2L, 2L), .Label = c("info1", "info2"), class = "factor"), 
    action_time = structure(1:5, .Label = c("time1", "time2", 
    "time3", "time4", "time5"), class = "factor")), .Names = c("id1", 
"id2", "info", "action_time"), class = "data.frame", row.names = c(NA, -5L))

【讨论】:

  • 嗨 Steven 我只是在想 spread() 是否可以处理 value 参数中的两个值列。这样输出可以是action_time 1 | action comment 1 |action_time2 |action comment 2 |。在这种情况下,原始数据(长表)的末尾会有另一列称为action comment。你认为这可能吗?我希望我的问题被明确提出。谢谢
  • @Lambo 我不确定我是否完全理解,您介意发布另一个带有可重现示例和所需输出的问题吗?
  • 感谢您的回复。这是我更新问题的链接 (stackoverflow.com/questions/31125693/…)
【解决方案2】:

使用tidyr

require(tidyr)
# replicate data
df <- structure(list(id1 = c(1, 1, 1, 2, 2), id2 = structure(c(1L, 
                                                               1L, 1L, 2L, 2L), .Label = c(" a    ", " b    "), class = "factor"), 
                     info = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("  info1 ", 
                                                                        "  info2 "), class = "factor"), action_time = structure(1:5, .Label = c("    time1    ", 
                                                                                                                                                "    time2    ", "    time3    ", "    time4    ", "    time5    "
                                                                        ), class = "factor")), .Names = c("id1", "id2", "info", "action_time"
                                                                        ), class = "data.frame", row.names = c(NA, -5L))


# create additional column on action_time sequence
action_no <- paste("action_time",
                   unlist(sapply(rle(df$id1)$lengths, function(x) seq(1, x))))
y <- cbind(df, action_no)

# spread into final dataframe
z <- spread(y, action_no, action_time)

最终输出

> z
  id1    id2     info action_time 1 action_time 2 action_time 3
1   1  a       info1      time1         time2         time3    
2   2  b       info2      time4         time5              <NA>

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2021-07-03
    • 1970-01-01
    • 1970-01-01
    • 2016-06-27
    • 2021-06-21
    • 1970-01-01
    • 1970-01-01
    • 2016-03-20
    相关资源
    最近更新 更多