R中的数据透视表答案

【问题标题】：pivot table in RR中的数据透视表
【发布时间】：2018-03-26 16:36:34
【问题描述】：

我有类似的日期数据框

id     weight  beginning_date   end_date     age  categ_car

22     2       1960-06-02       1960-06-02   17     A

17     4       2001-07-02                    19     B

我想要以下数据框

id     weight  beginning_date   end_date     age  categ_car

22     2       1960-06-02       1960-06-02   17     A
22     2       1961-06-02       1961-06-02   18     A
17     4       2001-07-02                    19     B
17     4       2002-07-02                    20     B
17     4       2003-07-02                    21     B
17     4       2004-07-02                    22     B

我知道我可以使用 reshape 2 包中的 melt 函数来创建枢轴，但我不知道如何增加日期和年龄？

谢谢，

没有

【问题讨论】：

为什么weight 在前两行中是 1？根据接下来的 4 行，它应该是 2。
哦是的对不起，我会修改

标签： r pivot reshape2

【解决方案1】：

这里有一些帮助您前进。您需要从日期列中获取年份，对日期列应用相同的函数，然后将它们全部绑定：

library(data.table)
setDT(df)
AddWeightage<-function(a,x){
  x<-cumsum(rep(1,x-1))
  return(x+a)
}
cols<-c("age")
df[,lapply(.SD,AddWeightage,x=weight), by=.(categ_car),.SDcols=cols]

这里是生成日期列的函数：

AddWeightDate<-function(a,x){
  x<-cumsum(rep(1,x-1))
  a1<-x+year(a)
  b<-substr(as.character(a),5,10)
  return(sprintf('%s%s',a1,b))
}

cols<-c('beginning_date',"end_date")
df3<-df[,lapply(.SD,AddWeightDate,x=weight), by=.(categ_car),.SDcols=cols]

【讨论】：

【解决方案2】：

我们可以使用tidyr 包中的complete 和fill 来寻找解决方案。重要的一点是使用 lubridate 包中的 %m+% 运算符生成日期序列（以 1 年递增）。

library(dplyr)
library(tidyr)
library(lubridate)

df %>%
  mutate(beginning_date = ymd(beginning_date), end_date = ymd(end_date)) %>%
  group_by(id) %>%
  complete(beginning_date = seq(beginning_date, beginning_date %m+% years(weight-1), 
             by="1 year")) %>%
  fill(weight, end_date, age, categ_car) %>% 
  arrange(desc(id)) %>%
  select(id, weight, beginning_date, end_date, age, categ_car)

# # A tibble: 6 x 6
# # Groups: id [2]
#      id  weight beginning_date end_date   age  categ_car
#    <int>  <int> <date>         <date>     <int> <chr>    
# 1    22      2 1960-06-02     1960-06-02    17   A        
# 2    22      2 1961-06-02     1960-06-02    17   A        
# 3    17      4 2001-07-02     NA            19   B        
# 4    17      4 2002-07-02     NA            19   B        
# 5    17      4 2003-07-02     NA            19   B        
# 6    17      4 2004-07-02     NA            19   B

更新：根据 OP 对同一“id”处理多个 begining_date 的反馈：

df %>%
  mutate(beginning_date = ymd(beginning_date), end_date = ymd(end_date)) %>%
  group_by(id) %>%
  complete(beginning_date = seq(as.Date(min(beginning_date), origin="1970-01-01"), 
                  as.Date(min(beginning_date), origin="1970-01-01") %m+% years(weight-1),
                                by="1 year")) %>%
  fill(weight, end_date, age, categ_car) %>% 
  arrange(desc(id)) %>%
  select(id, weight, beginning_date, end_date, age, categ_car)

数据

df <- read.table(text = 
      "id     weight  beginning_date   end_date     age  categ_car
       22     2       1960-06-02       1960-06-02   17     A
       17     4       2001-07-02         NA         19     B", 
       header = TRUE, stringsAsFactors = FALSE)

注意：NA 已用于代替 end_date 的 blank 值。

【讨论】：

非常感谢，但我有以下错误 'from' must be of length 1 ' 我不知道我可以做什么来使用 seq
@Nai 你在我的回答中尝试过df 吗？如果不是，请尝试第一次。我的猜测是，您的数据中有多个 beginning_date 对应相同的 id。
是的，我试过了，没关系，但是当我在我的 df 上尝试时，它就不行了，也许是因为 benning_date 的类型，我有一个约会，也许我需要字符？
@Naï Date 格式应该像 lubridate::ymd 返回 Date。您可以尝试将其转换为character，然后使用lubridate::ymd 转换为Date。 begining_date中有NA吗？
不，我在开始日期没有 NA