【问题标题】:Adding a seasons column to data table based on month dates根据月份日期向数据表添加季节列
【发布时间】:2016-08-22 13:27:32
【问题描述】:

我正在使用 data.table 并且我正在尝试创建一个名为“season”的新列,它基于名为“MonthName”的列创建一个具有相应季节的列,例如夏季、冬季……。

我想知道是否有更有效的方法可以根据月份值将季节列添加到数据表中。

这是 300,000 个观察值中的前 6 个,假设表名为“dt”。

    rrp         Year   Month Finyear hourminute AvgPriceByTOD MonthName
1: 35.27500     1999     1    1999      00:00      33.09037       Jan
2: 21.01167     1999     1    1999      00:00      33.09037       Jan
3: 25.28667     1999     2    1999      00:00      33.09037       Feb
4: 18.42334     1999     2    1999      00:00      33.09037       Feb
5: 16.67499     1999     2    1999      00:00      33.09037       Feb
6: 18.90001     1999     2    1999      00:00      33.09037       Feb

我试过下面的代码:

dt[, Season :=  ifelse(MonthName = c("Jun", "Jul", "Aug"),"Winter", ifelse(MonthName = c("Dec", "Jan", "Feb"), "Summer", ifelse(MonthName = c("Sep", "Oct", "Nov"), "Spring" , ifelse(MonthName = c("Mar", "Apr", "May"), "Autumn", NA))))]

返回:

 rrp totaldemand   Year Month Finyear hourminute AvgPriceByTOD MonthName Season
1: 35.27500     1999     1    1999      00:00      33.09037       Jan     NA
2: 21.01167     1999     1    1999      00:00      33.09037       Jan Summer
3: 25.28667     1999     2    1999      00:00      33.09037       Feb Summer
4: 18.42334     1999     2    1999      00:00      33.09037       Feb     NA
5: 16.67499     1999     2    1999      00:00      33.09037       Feb     NA
6: 18.90001     1999     2    1999      00:00      33.09037       Feb Summer

我得到错误:

Warning messages:
1: In MonthName == c("Jun", "Jul", "Aug") :
  longer object length is not a multiple of shorter object length
2: In MonthName == c("Dec", "Jan", "Feb") :
  longer object length is not a multiple of shorter object length
3: In MonthName == c("Sep", "Oct", "Nov") :
  longer object length is not a multiple of shorter object length
4: In MonthName == c("Mar", "Apr", "May") :
  longer object length is not a multiple of shorter object length 

除此之外,由于我不知道的原因,一些夏季月份被正确分配为“夏季”,但其他月份被分配为 NA,例如第 1 行和第 2 行都应该是夏季,但返回不同。

提前致谢!

【问题讨论】:

  • 使用MonthName %in% c("Jun",...),而不是=
  • 这不是错误,而是警告
  • 这并不理想,因为它会创建然后删除重复的关卡,但我通常在数字月份使用cutdroplevels(cut(dt$Month, breaks = c(0, 2, 5, 8, 11, 13), labels = c('Winter', 'Spring', 'Summer', 'Autumn', 'Winter')))

标签: r data.table


【解决方案1】:

一种非常简单的方法是使用查找表将月份名称映射到季节:

# create a named vector where names are the month names and elements are seasons
seasons <- rep(c("winter","spring","summer","fall"), each = 3)
names(seasons) <- month.abb[c(6:12,1:5)] # thanks thelatemail for pointing out month.abb
seasons
#     Jun      Jul      Aug      Sep      Oct      Nov      Dec      Jan 
#"winter" "winter" "winter" "spring" "spring" "spring" "summer" "summer" 
#     Feb      Mar      Apr      May 
#"summer"   "fall"   "fall"   "fall" 

使用它:

dt[, season := seasons[MonthName]]

数据:

dt <- setDT(read.table(text="    rrp         Year   Month Finyear hourminute AvgPriceByTOD MonthName
1: 35.27500     1999     1    1999      00:00      33.09037       Jan
2: 21.01167     1999     1    1999      00:00      33.09037       Jan
3: 25.28667     1999     2    1999      00:00      33.09037       Feb
4: 18.42334     1999     2    1999      00:00      33.09037       Feb
5: 16.67499     1999     2    1999      00:00      33.09037       Feb
6: 18.90001     1999     2    1999      00:00      33.09037       Feb",
   header = TRUE, stringsAsFactors = FALSE))

【讨论】:

  • 哈,我猜你在南半球的某个地方。
  • @alistaire - 我猜是美国,注意:"fall" ;)
  • @allistaire,我根据 OP 的映射将月份映射到季节。 “跌倒”是我的贡献,哈。
  • 哎呀,没注意到...我猜想将我之前的评论应用于 OP!
  • month.abb 顺便说一句存在于base R中,这将节省打字 - month.abb[c(6:12,1:5)]
【解决方案2】:

有点打字,但代码效率很高

dt[MonthName %in% c("Jun","Jul","Aug"), Season := "Winter"]
dt[MonthName %in% c("Dec","Jan","Feb"), Season := "Summer"]
dt[MonthName %in% c("Sep","Oct","Nov"), Season := "Spring"]
dt[is.na(MonthName), Season := "Autumn"]

在这里,我们在 data.table 的一个子集上分配引用

比起很多嵌套的ifelses,我更喜欢这个


如果你想检查一个值是否在一个向量中,你必须使用%in%。查看以下人员的不同行为:

myVec <- c("a","b","c")

"a" == myVec
[1] TRUE FALSE FALSE

"a" %in% myVec
[1] TRUE

【讨论】:

  • 制作查找表并像ref &lt;- data.table(MonthName=month.abb[c(12,1:11)], season=rep(c("Summer","Autumn","Winter","Spring"), each=3)); dt[ref, on="MonthName"]一样加入一次可能会更有效
  • @thelatemail - 类似于 Jota 的回答,我给了 +1 :)
  • 哎呀...我写评论时页面没有刷新。
  • 为解释 %in% 和 == 之间的区别干杯,非常有帮助,我也被绊倒了!
猜你喜欢
  • 1970-01-01
  • 2021-01-17
  • 1970-01-01
  • 2021-01-24
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2017-11-15
  • 2014-09-17
相关资源
最近更新 更多