【发布时间】:2019-06-14 07:48:12
【问题描述】:
以下是我的数据示例。我正在尝试为数据表创建数据,在我使用 dcast 函数后,数据必须以非常特定的顺序排列。我也试图计算一些列之间的差异。目标是按state、region、1_2017、1_2018、1_diff、2_2017、2_2018、2_diff等顺序获取数据。
我试图通过专门调用每一列来计算差异并对列进行排序,但这似乎是一种非常糟糕的方法,尤其是当我的实际数据超过 50 列时。下面是我使用的逻辑示例数据。
library(reshape2)
library(dplyr)
#Data
data<-data.frame("State"=c("AK","AK","AK","AK","AK","AK","AK","AK","AR","AR","AR","AR","AR","AR","AR","AR"),
"StoreRank" = c(1,1,1,1,2,2,2,2,1,1,1,1,2,2,2,2),
"Year" = c(2017,2018,2017,2018,2017,2018,2017,2018,2017,2018,2017,2018,2017,2018,2017,2018),
"Region" = c("East","East","West","West","East","East","West","West","East","East","West","West","East","East","West","West"),
"Store" = c("Ingles","Ingles","Ingles","Ingles","Safeway","Safeway","Safeway","Safeway","Albertsons","Albertsons","Albertsons","Albertsons","Safeway","Safeway","Safeway","Safeway"),
"Total" = c(500000,520000,480000,485000,600000,600000,500000,515000,500100,520100,480100,485100,601010,601000,501000,515100))
#Formatting data for Data table
data<-dcast(data, State+Region~StoreRank+Year, value.var = 'Total')
#Function to calculate difference between columns
diff_calculation <- function(data) {
mutate(data,
`1_diff` = data$`1_2018`-data$`1_2017`,
`2_diff` = data$`2_2018`-data$`2_2017`)}
#Applying difference calculation function
reform.data<-diff_calculation(data)
#Changes the column names from numbers to letter to try and order columns
names(reform.data)<-gsub(x = colnames(reform.data), pattern="1_", replacement = "a_")
names(reform.data)<-gsub(x = colnames(reform.data), pattern="2_", replacement = "b_")
#Trying to order columns as State, Region, 1_2017, 1_2018, 1_diff, 2_2017, 2_2018, 2_diff, etc.
ordered.data<-reform.data[,order(names(reform.data))]
final.data<-ordered.data %>%
select('State', 'Region', 'a_2017', 'a_2018', 'a_diff', 'b_2017', 'b_2018', 'b_diff')
我希望在将 dcast 函数应用于具有大量列的数据后,找到一种更好的方法来计算列和排序列之间的差异。
【问题讨论】: