【发布时间】:2019-10-19 14:33:39
【问题描述】:
我想生成一个名为new_p 的新变量,它基于paste0 获取其他变量的值。我下面的方法产生了新变量,但没有正确分配值。它只分配一个值并为lev 的其他值返回NA。感谢您的帮助。
我的数据:
tempDF <- structure(list(d1 = c("A", "B", "C"), d2 = c(40L, 50L, 20L),
d3 = c(20L, 40L, 50L), d4 = c(60L, 30L, 30L), p_A = c(1L,
3L, 2L), p_B = c(3L, 4L, 3L), p_C = c(2L, 1L, 1L), p4 = c(5L,
5L, 4L)), class = "data.frame", row.names = c(NA, -3L))
lev<-levels(as.factor(tempDF$d1))
View(tempDF)
我的方法:
for(i in seq_along(lev)){
func<-function(tempDF, i, lev){
newDT<-tempDF%>%
mutate(.,
new_p = case_when (
d1 == paste0(lev[i]) ~ .[, paste0("p", "_", lev[i])]
))%>%
as.data.frame(.)
}
newDT<-func(tempDF, i, lev) %>%
as.data.frame(.)
}
View(newDT)
newDT
d1 d2 d3 d4 p_A p_B p_C p4 new_p
A 40 20 60 1 3 2 5 NA
B 50 40 30 3 4 1 5 NA
C 20 50 30 2 3 1 4 1
预期输出:
newDT
d1 d2 d3 d4 p_A p_B p_C p4 new_p
A 40 20 60 1 3 2 5 1
B 50 40 30 3 4 1 5 4
C 20 50 30 2 3 1 4 1
编辑:Barradas 的函数应用于更大的数据:
tempDF <- structure(list(d1 = c("A", "B", "C", "A", "C"), d2 = c(40L, 50L, 20L, 50L, 20L),
d3 = c(20L, 40L, 50L, 40L, 50L), d4 = c(60L, 30L, 30L,60L, 30L), p_A = c(1L,
3L, 2L, 3L, 2L), p_B = c(3L, 4L, 3L, 3L, 4L), p_C = c(2L, 1L, 1L,2L, 1L), p4 = c(5L,
5L, 4L, 5L, 4L)), class = "data.frame", row.names = c(NA, -5L))
View(tempDF)
lev<-levels(as.factor(tempDF$d1))
func <- function(tempDF, lev){
i <- match(tempDF$d1, lev)
j <- match(paste0("p", "_", lev), names(tempDF))
tempDF$new_p <- tempDF[cbind(i, j)]
tempDF
}
newDT <- func(tempDF, lev)
Warning message:
In cbind(i, j) :
number of rows of result is not a multiple of vector length (arg 2)
View(newDT)
newDT
d1 d2 d3 d4 p_A p_B p_C p4 new_p
A 40 20 60 1 3 2 5 1
B 50 40 30 3 4 1 5 4
C 20 50 30 2 3 1 4 1
A 50 40 60 3 3 2 5 1 //wrong, new_p should be 3, not 1
C 20 50 30 2 4 1 4 3 //wrong, new_p should be 1, not 3
【问题讨论】:
标签: r regex function loops case-when