【发布时间】:2020-05-29 20:15:07
【问题描述】:
如果关联列中的值低于样本大小阈值,我将函数写入列中的 NA 值。如果一次应用于 1 个变量,该函数将起作用。
# Create dataframe
DF <- data.frame(VehicleType = c("Car","Car","LuxeryCar","Car","Car","LuxeryCar","LuxeryCar"),
Brand = c("Honda","Audi","Bentley","Chevrolet","Hyundai","Maserati","Porsche"),
VarA_Low=c(15000, 30000, 50000, 40000, 15000, 100000, 100000),
VarA_Medium=c(40000, 70000, 100000, 90000, 25000, 200000, 180000),
VarA_High=c(20000, 150000, 500000, 190000, 80000, 1000000, 500000),
VarA_SampleSize=c(39,44,51,35,45,65,53),
VarB_Low=c(15000, 30000, 50000, 40000, 15000, 100000, 100000),
VarB_Medium=c(40000, 70000, 100000, 90000, 25000, 200000, 180000),
VarB_High=c(20000, 150000, 500000, 190000, 80000, 1000000, 500000),
VarB_SampleSize=c(2,40,92,47,51,39,40))
# NA values if associated SampleSize is below 40
NA_values <- function(m) {
m <- deparse(substitute(m))
Var_L <- paste0(as.character(m), "_Low")
Var_M <- paste0(as.character(m), "_Medium")
Var_H <- paste0(as.character(m), "_High")
Count <- paste0(as.character(m), "_SampleSize")
DF[,Var_L] <- ifelse(DF[,Count] < 40, NA, DF[,Var_L])
DF[,Var_M] <- ifelse(DF[,Count] < 40, NA, DF[,Var_M])
DF[,Var_H] <- ifelse(DF[,Count] < 40, NA, DF[,Var_H])
return(DF)
}
# Apply function to one variable at a time
DF <- NA_values(VarA)
DF <- NA_values(VarB)
这可行,但我的解决方案是不切实际的,因为我通常有数百个变量,列名更改,变量数量。我想将所有变量声明为字符串向量并将函数应用于所有变量。
# Declare variables as a string vector
Vars <- c("VarA", "VarB")
# Create dataframe to store results
DF_NA <- DF
# Loop over DF and store results in DF_NA
for (item in Vars)
{
DF_NA[, c(item)] <- NA_values(item)
}
这会导致错误消息“未定义的列已选择”
【问题讨论】:
-
请不要在您的问题中包含像
rm(list = ls(all.names = TRUE))这样的行——没有人想复制/粘贴/运行您的代码并意外丢失他们正在处理的内容。
标签: r string loops if-statement vector