【问题标题】:generate a new factor variable depending on the values of other factors in each row根据每行中其他因子的值生成一个新的因子变量
【发布时间】:2015-08-31 23:19:56
【问题描述】:

我正在尝试创建一个函数,该函数将根据条件值生成一个新变量。我有一个包含 100 多列的调查数据集,这些列将相应地折叠。阅读this,但没有帮助。

'data.frame':   117 obs. of  7 variables:
 $ fin_partner: Factor w/ 4 levels "","9","No","Yes": 2 2 4 3 2 2 2 2 4 4 ...
 $ fin_parent : Factor w/ 4 levels "","9","No","Yes": 2 2 2 2 2 2 4 3 2 2 ...
 $ fin_kids   : Factor w/ 4 levels "","9","No","Yes": 4 2 2 2 2 2 2 2 2 2 ...
 $ fin_othkids: Factor w/ 4 levels "","9","No","Yes": 2 2 2 2 2 2 3 2 2 2 ...
 $ fin_fam    : Factor w/ 4 levels "","9","No","Yes": 2 2 2 2 2 2 4 3 2 2 ...
 $ fin_friend : Factor w/ 4 levels "","9","No","Yes": 2 2 3 3 2 2 2 2 4 2 ...
 $ fin_oth    : Factor w/ 4 levels "","9","No","Yes": 2 2 2 2 2 2 2 2 4 2 ...

我希望能够根据列对数据集进行子集化,然后将其传递给函数。现在,这些值包含“是”、“否”、“999”(表示缺失)。

我的目标是能够说明,对于每一行,任何列是否包含“是”,那么新列将填充“是”。我确信有比下面的代码更简单的方法,所以我对此持开放态度。

我目前的代码:

trial <- df[, 23:29]
trial.test <- as.data.frame(trial)

composite_score <- function(x){
  # Convert to numeric values
  change_to_number <- function(j) {
    for (i in 1:length(j)){
      if(i == "Yes"){
        i <- 1
      }
      else{
        i <- 0
      }
    }
  }

  x <- change_to_number(x)  

  new_col_var <- function(k){
    if(rowSums(k) > 0){
      k$newvar <- 1
    }
    else {
      k$newvar <- 0
    }
  }

  x <- new_col_var(x)

}

composite_score(trial.test)

代码产生以下错误:

Error in rowSums(k) : 'x' must be an array of at least two dimensions 

数据:

> dput(head(trial.test))
structure(list(fin_partner = structure(c(2L, 2L, 4L, 3L, 2L, 
2L), .Label = c("", "9", "No", "Yes"), class = "factor"), fin_parent = structure(c(2L, 
2L, 2L, 2L, 2L, 2L), .Label = c("", "9", "No", "Yes"), class = "factor"), 
    fin_kids = structure(c(4L, 2L, 2L, 2L, 2L, 2L), .Label = c("", 
    "9", "No", "Yes"), class = "factor"), fin_othkids = structure(c(2L, 
    2L, 2L, 2L, 2L, 2L), .Label = c("", "9", "No", "Yes"), class = "factor"), 
    fin_fam = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", 
    "9", "No", "Yes"), class = "factor"), fin_friend = structure(c(2L, 
    2L, 3L, 3L, 2L, 2L), .Label = c("", "9", "No", "Yes"), class = "factor"), 
    fin_oth = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", 
    "9", "No", "Yes"), class = "factor")), .Names = c("fin_partner", 
"fin_parent", "fin_kids", "fin_othkids", "fin_fam", "fin_friend", 
"fin_oth"), row.names = c(NA, 6L), class = "data.frame")

【问题讨论】:

  • 试试rowSums(1:5)rowSums(matrix(1:5)) 还有你希望rowSums(k) &gt; 0 做什么?你会有多个 TRUE/FALSE 而不仅仅是一个
  • 您能添加一些示例数据供人们使用吗?
  • @rawr 我希望rowSums 对标志进行计数,如果总和不为0,则新列将为1
  • saply(df, MARGIN=1, FUN=function(row) ifelse(any("Yes" %in% row), "Yes", "No")) 之类的东西应该可以工作。如果您想要有效的答案,请提供数据!例如,发布dput(head(trial.test))的值。
  • 这太棒了@antoine-sac。谢谢您的帮助。 @gung 我在编辑中添加了dput

标签: r


【解决方案1】:

您的 change_to_number 函数严重损坏 - 它仅将 i 更改为 1 或 0,在输入中没有任何结果。您可以将其更改为:

change_to_number <- function(j){
        sapply(j, function(x) +(x=="yes"))
}

或者,将整体功能改为:

composite_score <- function(x){
    +(apply(x, 1, function(z) ("yes" %in% z)))
}

然后运行你的函数:

dat$newcol <- composite_score(dat)

解释:你想知道每一行是否有"yes"。要查看是否有,您可以为每一行运行以下命令:

"yes" %in% trial.test[1, ]
"yes" %in% trial.test[2, ]....

为此,您可以如下使用 apply - 我们在 z 中跨行(第 1 行)应用函数“yes”,并且每一行都作为 z 传递给函数:

tempdata <- apply(trial.test, 1, function(z) ("yes" %in% z))
tempdata

您应该为每一行获得一个TRUEFALSE。现在我们可以做一个小技巧,R 会将TRUE 转换为 1,将FALSE 转换为 0:

as.numeric(tempdata)
+(tempdata) #same, less typing

如果我们把它们放在一起,你就会得到你的新专栏:

+(apply(trial.test, 1, function(z) ("yes" %in% z)))

【讨论】:

  • 谢谢@jeremycg。我使用第一个建议来清理那部分。
【解决方案2】:

感谢您发布数据,这样可以实际检查我写的内容!

# Loading your data
trial.test <- structure(list(fin_partner = [... redacted ...], class = "data.frame")

# computing the new variable
# the MARGIN=1 arg precises that we are working on the rows
# the applied function just looks for a "Yes" in the row
# and returns "Yes" if... yes, "No" otherwise.
myvar <- apply(trial.test, MARGIN=1, FUN=function(row) 
    ifelse(any("Yes" %in% row), "Yes", "No"))

# converting it to factor
myvar <- factor(myvar)

# putting it in trial.test just for illustration
cbind(trial.test, summary=myvar)

这给出了:

  fin_partner fin_parent fin_kids fin_othkids fin_fam fin_friend fin_oth summary
1           9          9      Yes           9       9          9       9     Yes
2           9          9        9           9       9          9       9      No
3         Yes          9        9           9       9         No       9     Yes
4          No          9        9           9       9         No       9      No
5           9          9        9           9       9          9       9      No
6           9          9        9           9       9          9       9      No

【讨论】:

    【解决方案3】:
    library(tidyr)
    library(dplyr)
    library(magrittr)
    
    trial.test %<>% mutate(row_number = 1:n())
    
    answer = 
      trial.test %>%
      gather(variable, value, -row_number) %>%
      filter(value == "Yes") %>%
      select(-variable) %>%
      distinct %>%
      right_join(trial.test)
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2022-08-15
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-06-01
      • 2016-03-09
      • 1970-01-01
      相关资源
      最近更新 更多