【问题标题】:Converting character values to numeric values in R with a function使用函数将字符值转换为R中的数值
【发布时间】:2015-11-12 13:27:52
【问题描述】:

我希望加载和处理一个包含七个变量的 CSV 文件,一个是分组变量/因子 (data$hashtag),六个是类别(data$support 和其他),用“X”表示或“x”(或留空)。

data <- read.csv("maet_coded_tweets.csv", stringsAsFactors = F)

names(data) <- c("hashtag", "support", "contributeConversation", "otherCommunities", "buildCommunity", "engageConversation", "unclear")

str(data)

'data.frame':   854 obs. of  7 variables:
 $ hashtag               : chr  "#capstoneisfun" "#capstoneisfun" "#capstoneisfun" "#capstoneisfun" ...
 $ support               : chr  "x" "x" "x" "x" ...
 $ contributeConversation: chr  "" "" "" "" ...
 $ otherCommunities      : chr  "" "" "" "" ...
 $ buildCommunity        : chr  "" "" "" "" ...
 $ engageConversation    : chr  "" "" "" "" ...
 $ unclear               : chr  "" "" "" "" ...

当我使用函数将“X”或“x”重新编码为 1 和“”(空白)0 时,数据是奇怪的字符类型,而不是预期的数字。

recode <- function(x) {

  x[x=="x"] <- 1
  x[x=="X"] <- 1
  x[x==""] <- 0
  x
}

data[] <- lapply(data, recode)

str(data)

'data.frame':   854 obs. of  7 variables:
 $ hashtag               : chr  "#capstoneisfun" "#capstoneisfun" "#capstoneisfun" "#capstoneisfun" ...
 $ support               : chr  "1" "1" "1" "1" ...
 $ contributeConversation: chr  "0" "0" "0" "0" ...
 $ otherCommunities      : chr  "0" "0" "0" "0" ...
 $ buildCommunity        : chr  "0" "0" "0" "0" ...
 $ engageConversation    : chr  "0" "0" "0" "0" ...
 $ unclear               : chr  "0" "0" "0" "0" ...

当我尝试在函数中使用as.numeric() 强制字符时,它仍然不起作用。什么给出 - 为什么将变量视为字符以及如何将变量字符转换为数字?

【问题讨论】:

  • 向量只能保存一种数据类型。因此,如果您将字符串的一部分替换为数字变量,它将被转换为字符。你到底是怎么在函数中使用as.numeric()的?
  • recode &lt;- function(x) { x[x=="x"] &lt;- as.numeric(1) x[x=="X"] &lt;- as.numeric(1) x[x==""] &lt;- as.numeric(0) x }
  • 你可以试试return(as.numeric(x))。正如我在之前的评论中所说,您这样做的方式仍然迫使您转变为角色。或者你可以做res &lt;- ifelse(x %in% c("x","X"),1,0)

标签: r


【解决方案1】:

怎么样:

recode <- function(x) {
  ifelse(x %in% c('X','x'), 1,0)
}

解释:函数中的步骤是按顺序计算的,而不是同时计算的。因此,当您将 1 部分分配给字符向量时,它们会转换为“1”。

【讨论】:

  • 为了进一步练习,您还可以使用as.numeric() 将逻辑转换为0/1。所以你可以做recode2 &lt;- function(x) as.numeric(x %in% c('X','x')) 甚至recode3 &lt;- function(x) as.numeric(grepl('^x$',x, ignore.case=T))
  • 我要加as.integer(tolower(x) == "x")
  • @docendodiscimus 是的,只是说明 OP 似乎需要数字。但是,bool 似乎比 int 或 float 更合乎逻辑:)
【解决方案2】:

这样的事情怎么样?

# sample data with support being a character vector
data.frame(support = c("X","X","0","x","0"),a=1:5,stringsAsFactors = F)->myDat
# convert to a factor and check the order of the levels
myDat$support <- as.factor(myDat$support)
levels(myDat$support)
#"0" "x" "X"
# just to see that it worked make an additional variable
myDat$supportrecoded <- myDat$support
# change levels and convert
levels(myDat$supportrecoded) <- c("0","1","1")
myDat$supportrecoded <- as.integer(as.character(myDat$supportrecoded ))

【讨论】:

    【解决方案3】:

    使用来自plyrmapvalues

    data$support <- as.numeric(mapvalues(data$support, c("X", "x", ""), c(1, 1, 0)))
    

    使用replace

    data$support <- replace(x <- data$support, x == "X", 1)
    data$support <- replace(x <- data$support, x == "x", 1)
    data$support <- replace(x <- data$support, x == "", 0)
    data$support <- numeric(data$support)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-03-17
      • 2020-10-28
      • 1970-01-01
      • 2019-06-22
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-06-11
      相关资源
      最近更新 更多