在r中按日期和时间对数据框进行排序和排名答案

【问题标题】：Sorting and ranking a dataframe by date and time in r在r中按日期和时间对数据框进行排序和排名
【发布时间】：2014-05-21 20:12:02
【问题描述】：

我有一个如下的数据框。最初它只是两列/变量-“时间戳”（包含日期和时间）和“演员”。我将“时间戳”变量分解为“日期”和“时间”，然后将“时间进一步分解为“小时”和“分钟”。然后给出以下结构

dataf<-structure(list(hours = structure(c(3L, 4L, 4L, 3L, 3L, 3L, 6L, 
6L, 6L, 6L, 6L, 2L, 2L, 2L, 2L, 5L, 5L, 5L, 1L, 1L, 2L, 2L), .Label = c("9", 
"12", "14", "15", "16", "17"), class = "factor"), mins = structure(c(17L, 
1L, 2L, 14L, 15L, 16L, 3L, 4L, 6L, 6L, 7L, 9L, 9L, 13L, 13L, 
10L, 11L, 12L, 2L, 5L, 8L, 8L), .Label = c("00", "04", "08", 
"09", "10", "12", "13", "18", "19", "20", "21", "22", "27", "39", 
"51", "52", "59"), class = "factor"), date = structure(c(3L, 
3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 4L, 4L, 
4L, 1L, 1L, 1L, 1L), .Label = c("4/28/2014", "5/18/2014", "5/2/2014", 
"5/6/2014"), class = "factor"), time = structure(c(7L, 8L, 9L, 
4L, 5L, 6L, 13L, 14L, 15L, 15L, 16L, 2L, 2L, 3L, 3L, 10L, 11L, 
12L, 17L, 18L, 1L, 1L), .Label = c("12:18", "12:19", "12:27", 
"14:39", "14:51", "14:52", "14:59", "15:00", "15:04", "16:20", 
"16:21", "16:22", "17:08", "17:09", "17:12", "17:13", "9:04", 
"9:10"), class = "factor"), Timestamp = structure(c(13L, 14L, 
15L, 10L, 11L, 12L, 6L, 7L, 8L, 8L, 9L, 2L, 2L, 3L, 3L, 16L, 
17L, 18L, 4L, 5L, 1L, 1L), .Label = c("4/28/2014 12:18", "4/28/2014 12:19", 
"4/28/2014 12:27", "4/28/2014 9:04", "4/28/2014 9:10", "5/18/2014 17:08", 
"5/18/2014 17:09", "5/18/2014 17:12", "5/18/2014 17:13", "5/2/2014 14:39", 
"5/2/2014 14:51", "5/2/2014 14:52", "5/2/2014 14:59", "5/2/2014 15:00", 
"5/2/2014 15:04", "5/6/2014 16:20", "5/6/2014 16:21", "5/6/2014 16:22"
), class = "factor"), Actor = c(7L, 7L, 7L, 7L, 7L, 7L, 5L, 5L, 
2L, 12L, 2L, 7L, 7L, 7L, 7L, 10L, 10L, 10L, 7L, 10L, 7L, 7L)), .Names = c("hours", 
"mins", "date", "time", "Timestamp", "Actor"), row.names = c(NA, 
-22L), class = "data.frame")

将时间戳和时间变量分解为单独变量的原因是，在我的真实数据中，我在按数据和/或时间排序时遇到了很多问题。将这些变量分解成更小的块使排序变得更加容易。

我现在想做的是创建一个名为“Rank”的新变量，它将为数据框中最早的事件返回“1”（这将是 2014 年 4 月 28 日上午 9 点的观察），然后是“2”，用于按日期/时间顺序进行下一次观察，依此类推。

对数据框进行排序似乎比较简单：

dataf<-dataf[order(as.Date(dataf$date, format="%m/%d/%Y"), dataf$hours, dataf$mins),]

这可以完成工作。但我现在苦苦挣扎的是分配等级。

我试过这个，因为我使用了 'ave' 和 FUN=rank 来对整数进行排名，但它产生的结果是可笑的错误：

dataf$rank <- ave((dataf[order(as.Date(dataf$date, format="%m/%d/%Y"), dataf$hours, dataf$mins),]),FUN=rank )

任何帮助表示赞赏

【问题讨论】：

dataf$rank <- rank(dataf$Timestamp) 还不够吗？
@sgibb 出于某种原因，OP 通过将所有内容存储为因素而不是使用日期和日期时间对象，让生活变得相当困难。而且有重复的时间戳，所以我们甚至不能回避这个问题并说要去做seq_len(nrow(dataf))。

标签： r date time rank

【解决方案1】：

我不同意你对日期时间对象的厌恶，这使得这一切变得更加简单：

dataf$ts <- strptime(as.character(dataf$Timestamp),'%m/%d/%Y %H:%M')
dataf <- dataf[order(dataf$ts),]
dataf$ts_rank <- rank(dataf$ts,ties.method = "min")
dataf
##    hours mins      date  time       Timestamp Actor                  ts ts_rank
## 19     9   04 4/28/2014  9:04  4/28/2014 9:04     7 2014-04-28 09:04:00       1
## 20     9   10 4/28/2014  9:10  4/28/2014 9:10    10 2014-04-28 09:10:00       2
## 21    12   18 4/28/2014 12:18 4/28/2014 12:18     7 2014-04-28 12:18:00       3
## 22    12   18 4/28/2014 12:18 4/28/2014 12:18     7 2014-04-28 12:18:00       3
## 12    12   19 4/28/2014 12:19 4/28/2014 12:19     7 2014-04-28 12:19:00       5
## 13    12   19 4/28/2014 12:19 4/28/2014 12:19     7 2014-04-28 12:19:00       5
## 14    12   27 4/28/2014 12:27 4/28/2014 12:27     7 2014-04-28 12:27:00       7
## 15    12   27 4/28/2014 12:27 4/28/2014 12:27     7 2014-04-28 12:27:00       7
## 4     14   39  5/2/2014 14:39  5/2/2014 14:39     7 2014-05-02 14:39:00       9
## 5     14   51  5/2/2014 14:51  5/2/2014 14:51     7 2014-05-02 14:51:00      10
## 6     14   52  5/2/2014 14:52  5/2/2014 14:52     7 2014-05-02 14:52:00      11
## 1     14   59  5/2/2014 14:59  5/2/2014 14:59     7 2014-05-02 14:59:00      12
## 2     15   00  5/2/2014 15:00  5/2/2014 15:00     7 2014-05-02 15:00:00      13
## 3     15   04  5/2/2014 15:04  5/2/2014 15:04     7 2014-05-02 15:04:00      14
## 16    16   20  5/6/2014 16:20  5/6/2014 16:20    10 2014-05-06 16:20:00      15
## 17    16   21  5/6/2014 16:21  5/6/2014 16:21    10 2014-05-06 16:21:00      16
## 18    16   22  5/6/2014 16:22  5/6/2014 16:22    10 2014-05-06 16:22:00      17
## 7     17   08 5/18/2014 17:08 5/18/2014 17:08     5 2014-05-18 17:08:00      18
## 8     17   09 5/18/2014 17:09 5/18/2014 17:09     5 2014-05-18 17:09:00      19
## 9     17   12 5/18/2014 17:12 5/18/2014 17:12     2 2014-05-18 17:12:00      20
## 10    17   12 5/18/2014 17:12 5/18/2014 17:12    12 2014-05-18 17:12:00      20
## 11    17   13 5/18/2014 17:13 5/18/2014 17:13     2 2014-05-18 17:13:00      22

【讨论】：

非常感谢 - 我不再厌恶时间戳。为了确保 ts_rank 列具有唯一的排名（即不能共享排名，这是我的数据需要的） - 我已经使用了：dataf$ts_rank