【发布时间】:2016-08-22 13:27:32
【问题描述】:
我正在使用 data.table 并且我正在尝试创建一个名为“season”的新列,它基于名为“MonthName”的列创建一个具有相应季节的列,例如夏季、冬季……。
我想知道是否有更有效的方法可以根据月份值将季节列添加到数据表中。
这是 300,000 个观察值中的前 6 个,假设表名为“dt”。
rrp Year Month Finyear hourminute AvgPriceByTOD MonthName
1: 35.27500 1999 1 1999 00:00 33.09037 Jan
2: 21.01167 1999 1 1999 00:00 33.09037 Jan
3: 25.28667 1999 2 1999 00:00 33.09037 Feb
4: 18.42334 1999 2 1999 00:00 33.09037 Feb
5: 16.67499 1999 2 1999 00:00 33.09037 Feb
6: 18.90001 1999 2 1999 00:00 33.09037 Feb
我试过下面的代码:
dt[, Season := ifelse(MonthName = c("Jun", "Jul", "Aug"),"Winter", ifelse(MonthName = c("Dec", "Jan", "Feb"), "Summer", ifelse(MonthName = c("Sep", "Oct", "Nov"), "Spring" , ifelse(MonthName = c("Mar", "Apr", "May"), "Autumn", NA))))]
返回:
rrp totaldemand Year Month Finyear hourminute AvgPriceByTOD MonthName Season
1: 35.27500 1999 1 1999 00:00 33.09037 Jan NA
2: 21.01167 1999 1 1999 00:00 33.09037 Jan Summer
3: 25.28667 1999 2 1999 00:00 33.09037 Feb Summer
4: 18.42334 1999 2 1999 00:00 33.09037 Feb NA
5: 16.67499 1999 2 1999 00:00 33.09037 Feb NA
6: 18.90001 1999 2 1999 00:00 33.09037 Feb Summer
我得到错误:
Warning messages:
1: In MonthName == c("Jun", "Jul", "Aug") :
longer object length is not a multiple of shorter object length
2: In MonthName == c("Dec", "Jan", "Feb") :
longer object length is not a multiple of shorter object length
3: In MonthName == c("Sep", "Oct", "Nov") :
longer object length is not a multiple of shorter object length
4: In MonthName == c("Mar", "Apr", "May") :
longer object length is not a multiple of shorter object length
除此之外,由于我不知道的原因,一些夏季月份被正确分配为“夏季”,但其他月份被分配为 NA,例如第 1 行和第 2 行都应该是夏季,但返回不同。
提前致谢!
【问题讨论】:
-
使用
MonthName %in% c("Jun",...),而不是= -
这不是错误,而是警告
-
这并不理想,因为它会创建然后删除重复的关卡,但我通常在数字月份使用
cut:droplevels(cut(dt$Month, breaks = c(0, 2, 5, 8, 11, 13), labels = c('Winter', 'Spring', 'Summer', 'Autumn', 'Winter')))
标签: r data.table