如何使用日期作为过滤器答案

【问题标题】：How to use date as filter如何使用日期作为过滤器
【发布时间】：2018-11-19 17:21:58
【问题描述】：

我对 R 和一般脚本的了解几乎不存在。所以我希望你能耐心回答这个基本问题。

library(lubridate)
date.depature <- c("2016.06.16", "2016.11.16", "2017.01.05", "2017.01.12", "2017.02.25")
airport.departure <- c("CDG", "QNY", "QXO", "CDG", "QNY")
airport.arrival <- c("SYD", "CDG", "QNY", "SYD", "QXO")
amount <- c("1", "3", "1", "10", "5")
date.depature <- as_date(date.depature)
df <- data.frame(date.depature, airport.departure, airport.arrival, amount)

xtabs(as.integer(amount) ~ airport.arrival + airport.departure, df)

使用此代码，我们以矩阵形式获得金额的总和，其中机场为行/列。现在我只需要

的结果

2017
2017.01
至 2017.01

【问题讨论】：

标签： r date dataframe matrix

【解决方案1】：

由于您已经在使用 lubridate，我将向您展示一种使用 dplyr（与 lubridate 一起成为 tidyverse 的一部分）的方法。

所有解决方案都适用。 filter 与 month、year 和 as_date 一起使用来自 lubridate 的函数来创建过滤数据的条件，然后使用 pipe %>% 将该长度传递给 xtabs

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date

date.depature <- c("2016.06.16", "2016.11.16", "2017.01.05", "2017.01.12", "2017.02.25")
airport.departure <- c("CDG", "QNY", "QXO", "CDG", "QNY")
airport.arrival <- c("SYD", "CDG", "QNY", "SYD", "QXO")
amount <- c("1", "3", "1", "10", "5")
date.depature <- as_date(date.depature)
df <- data.frame(date.depature, airport.departure, airport.arrival, amount)

# For 2017
df %>% 
  filter(year(date.depature) == 2017) %>% 
  xtabs(as.integer(amount) ~ airport.arrival + airport.departure, .)
#>                airport.departure
#> airport.arrival CDG QNY QXO
#>             CDG   0   0   0
#>             QNY   0   0   1
#>             QXO   0   4   0
#>             SYD   2   0   0

# 2017.01
df %>% 
  filter(year(date.depature) == 2017, month(date.depature) == 1) %>% 
  xtabs(as.integer(amount) ~ airport.arrival + airport.departure, .)
#>                airport.departure
#> airport.arrival CDG QNY QXO
#>             CDG   0   0   0
#>             QNY   0   0   1
#>             QXO   0   0   0
#>             SYD   2   0   0

# until 2017.01
df %>% 
  filter(date.depature <= as_date("2017.01.01")) %>% 
  xtabs(as.integer(amount) ~ airport.arrival + airport.departure, .)
#>                airport.departure
#> airport.arrival CDG QNY QXO
#>             CDG   0   3   0
#>             QNY   0   0   0
#>             QXO   0   0   0
#>             SYD   1   0   0

^{由reprex package (v0.2.1) 于 2018 年 11 月 19 日创建}

【讨论】：

【解决方案2】：

当你创建df 时，为什么不强制amount 到类"integer"？只需去掉

中的双引号

amount <- c("1", "3", "1", "10", "5")

或

amount <- as.integer(c("1", "3", "1", "10", "5"))

这是因为as.integer(df$amount)不返回

c(1, 3, 1, 10, 5)

当您创建数据框 df 时，该向量被强制转换为 "factor" 类，而您现在拥有的是

as.integer(df$amount)
#[1] 1 3 1 2 4

正确的方法是

as.integer(as.character(df$amount))
#[1]  1  3  1 10  5

或者更简单地说：

date.depature <- c("2016.06.16", "2016.11.16", "2017.01.05", "2017.01.12", "2017.02.25")
airport.departure <- c("CDG", "QNY", "QXO", "CDG", "QNY")
airport.arrival <- c("SYD", "CDG", "QNY", "SYD", "QXO")
amount <- c(1, 3, 1, 10, 5)
date.depature <- as_date(date.depature)
df <- data.frame(date.depature, airport.departure, airport.arrival, amount)

现在是问题。

这基本上是一个子集问题。
对提取所需年份和月份的数据进行子集化，然后运行相同的 xtabs 命令。

df1 <- df[year(df$date.depature) == 2017, ]
df2 <- df1[month(df1$date.depature) == 1, ]
df3 <- cbind(df[year(df$date.depature) < 2017, ], df2)

现在xtabs，上面有子数据框。

xtabs(amount ~ airport.arrival + airport.departure, df1)
xtabs(amount ~ airport.arrival + airport.departure, df2)
xtabs(amount ~ airport.arrival + airport.departure, df3)

【讨论】：

非常感谢您向我解释整数问题。我没有意识到这一点。

【解决方案3】：

您需要在 xtabs 调用中对 date.departure 进行子集化。年份 == 2017：

xtabs(as.integer(amount) ~ airport.arrival + airport.departure, df[year(df$date.depature)==2017,])

对于年==2017 和月==1：

xtabs(as.integer(amount) ~ airport.arrival + airport.departure, df[year(df$date.depature)==2017 & month(df$date.departure)==1,])

对于 2017 年 1 月之前的任何事情：

xtabs(as.integer(amount) ~ airport.arrival + airport.departure, df[df$date.depature<as_date("2017-01-01"),])

【讨论】：