【问题标题】:generate labels for variables in R为 R 中的变量生成标签
【发布时间】:2013-08-13 12:33:22
【问题描述】:

我正在寻找一种比这个更好/更快的方法来为变量生成标签:

df <- data.frame(a=c(0,7,1,10,2,4,3,5,10,1,7,8,3,2))
pick <- c(0,1,2,3,10)
df[sapply(df$a,function(x) !(x %in% pick)),"a"] <- "a"
df[sapply(df$a,function(x) x==0),"a"] <- "b"
df[sapply(df$a,function(x) x==1 | x==2 | x==3),"a"] <- "c"
df[sapply(df$a,function(x) x==10),"a"] <- "d"

df$a
[1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c" 

为了简单起见,我在这个例子中只有一个变量,当然我的数据集中还有更多变量,但我只想更改一个特定的变量。

【问题讨论】:

标签: r label


【解决方案1】:

你不需要sapply:

df$a[!df$a %in% pick] <- "a"
df$a[df$a==0] <- "b"
df$a[df$a %in% 1:3] <- "c"
df$a[df$a==10] <- "d"

您也可以使用因子产生相同的结果:

df <- data.frame(a=c(0,7,1,10,2,4,3,5,10,1,7,8,3,2))

# the above method
a <- df$a
a[!df$a %in% pick] <- "a"
a[df$a==0] <- "b"
a[df$a %in% 1:3] <- "c"
a[df$a==10] <- "d"

# one way that gives a warning
b1 <- factor(df$a, levels=0:10, labels=c("b",rep("c",3),rep("a",6),"d"))

# another way that won't give a warning
b2 <- factor(df$a)
levels(b2) <- c("b",rep("c",3),rep("a",4),"d")
b2 <- as.character(b2)

# a third strategy using `library(car)`
b3 <- car::recode(df$a,"0='b';1:3='c';10='d';else='a'")

# check that all strategies are the same
all.equal(a,as.character(b1))
# [1] TRUE
all.equal(as.character(b1),as.character(b2))
# [1] TRUE
all.equal(as.character(b1),as.character(b3))
# [1] TRUE

【讨论】:

    【解决方案2】:

    您还可以考虑plyr 中的mapvaluesrevalue,尤其是在您处理更多标签时:

    df$a <- mapvalues(df$a, c(0, 1, 2, 3, 10), c("b", "c", "c", "c", "d"))
    df$a[! df$a %in% c("b", "c", "d")] <- "a" # The !pick values
    

    【讨论】:

      【解决方案3】:

      这是另一个相当简单的解决方案:

      names(pick) <- c("b", "c", "c", "c", "d")
      x <- names(pick[match(df$a, pick)])
      x[is.na(x)] <- "a"
      x
      # [1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c"
      

      如果您在“pick”对象中包含NA,那就更简单了。

      pick <- c(NA, 0, 1, 2, 3, 10)
      names(pick) <- c("a", "b", "c", "c", "c", "d")
      names(pick[match(df$a, pick, nomatch = 1)])
      # [1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c"
      

      如果您使用第二种选择,请注意nomatch 采用您要再次匹配的位置的整数值。在这里,nomatch 映射到“NA”,它位于“pick”向量中的第一个位置。如果“NA”在最后一个位置,您可以将其输入为nomatch = 6

      【讨论】:

        【解决方案4】:

        您也可以使用ifelse 函数。

        with(df,ifelse(a==0,"b",ifelse(a %in% c(1,2,3),"c",ifelse(a==10,"d","a"))))
         [1] "b" "a" "c" "d" "c" "a" "c" "a" "d" "c" "a" "a" "c" "c"
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 2018-08-12
          • 2020-08-26
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2011-01-10
          • 2011-03-23
          • 1970-01-01
          相关资源
          最近更新 更多