如何将宽格式数据转换为长格式（使用日期）？答案

【问题标题】：How to convert wide format data to long format (Working with dates)?如何将宽格式数据转换为长格式（使用日期）？
【发布时间】：2018-07-19 08:32:00
【问题描述】：

我有一个数据框（测量降水量），其中日期沿列标题。

Observations: 1,195
Variables: 33
$ Year  <int> 1901, 1901, 1901, 1901, 1901, 1901, 1901, 1901, 1901, 190...
$ Month <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, ...
$ X1    <dbl> 9.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.6, 0.0, 0.0, 0.0, 0....
$ X2    <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 12.4, 0.0, 0.0, 0.0, 0...
$ X3    <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0....
$ X4    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
$ X5    <dbl> 0.0, 0.5, 0.0, 0.0, 1.8, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0....
$ X6    <dbl> 0.0, 0.0, 0.0, 0.0, 4.3, 0.0, 0.0, 11.7, 0.0, 0.0, 0.0, 0...

我想将其转换为长格式，其中按天数也在一列中。我用过：

library(tidyr)
long <- gather(dataframe, Day, PCP, -Month,-Year)

输出是：

head(long)
  Year Month Day PCP
1 1901     1  X1 9.1
2 1901     2  X1 0.0
3 1901     3  X1 0.0
4 1901     4  X1 0.0
5 1901     5  X1 0.0
6 1901     6  X1 0.0

我希望输出如下所示，其中每个月都按顺序与其天数相关联：

  Year Month Day PCP
1 1901     01  01 9.1
2 1901     01  02 0.0
3 1901     01  03 0.0
4 1901     01  04 0.0
5 1901     01  05 0.0
6 1901     01  06 0.0

那么，我该如何实现呢？您的帮助将不胜感激。问候

【问题讨论】：

随意排序，例如library(dplyr); long %>% arrange(Year, Month, Day)。您可能还想删除Xs，例如mutate(Month = readr::parse_number(Month))
以可复制格式提供样本数据通常是值得赞赏的

标签： r tidyr lubridate

【解决方案1】：

这里有一个解决方案。你想做两件事

从日期列中删除前导 X（使用 mutate(str_replace)）
排序以使表格按月和日顺序排列。（使用arrange）

这样实现的：

library(tidyverse)
tbl <- tibble(
  year = rep(1901, 6),
  month = 1:6,
  X1 = c(9.1, 0, 0, 0, 0, 0),
  X2 = rep(0, 6),
  X3 = rep(0, 6),
  X4 = rep(0, 6),
  X5 = c(0, 0.5, 0, 0, 1.8, 0),
  X6 = c(0, 0, 0, 0, 4.3, 0)
)

tbl %>%
  gather(key = "day", value = "precip", X1:X6) %>%
  mutate(day = as.numeric(str_replace(day, "X", ""))) %>%
  arrange(year, month, day)
# A tibble: 36 x 4
    year month   day precip
   <dbl> <int> <dbl>  <dbl>
 1  1901     1  1.00   9.10
 2  1901     1  2.00   0   
 3  1901     1  3.00   0   
 4  1901     1  4.00   0   
 5  1901     1  5.00   0   
 6  1901     1  6.00   0   
 7  1901     2  1.00   0   
 8  1901     2  2.00   0   
 9  1901     2  3.00   0   
10  1901     2  4.00   0   
# ... with 26 more rows

【讨论】：

它有效，但问题是它显示月份中无效日期的 NA。例如。 NA 代表 4 月 31 日，NAs 代表 1901 年至 2 月 29 日、30 日和 31 日（不存在），因此本月其余时间以类似方式重复。我根本不想展示那些无效的日子。我用过：gather(key = "day", value = "precip", X1:X31) 。那么如何纠正呢？我有 100 年的数据，手动删除 NA 很麻烦 1901 2 28 0.0 1901 2 29 NA 1901 2 30 NA 1901 2 31 NA 1901 3 1 0.0
您可以只执行filter(!is.na(precip))，这将删除具有NA 的每一行precip。如果您已经运行了代码，这可能是最简单的。您还可以更改gather 以包含na.rm = TRUE，因为您的原始数据中可能有一个NA 是不存在的一天。