根据每行中其他因子的值生成一个新的因子变量答案

【问题标题】：generate a new factor variable depending on the values of other factors in each row根据每行中其他因子的值生成一个新的因子变量
【发布时间】：2015-08-31 23:19:56
【问题描述】：

我正在尝试创建一个函数，该函数将根据条件值生成一个新变量。我有一个包含 100 多列的调查数据集，这些列将相应地折叠。阅读this，但没有帮助。

'data.frame':   117 obs. of  7 variables:
 $ fin_partner: Factor w/ 4 levels "","9","No","Yes": 2 2 4 3 2 2 2 2 4 4 ...
 $ fin_parent : Factor w/ 4 levels "","9","No","Yes": 2 2 2 2 2 2 4 3 2 2 ...
 $ fin_kids   : Factor w/ 4 levels "","9","No","Yes": 4 2 2 2 2 2 2 2 2 2 ...
 $ fin_othkids: Factor w/ 4 levels "","9","No","Yes": 2 2 2 2 2 2 3 2 2 2 ...
 $ fin_fam    : Factor w/ 4 levels "","9","No","Yes": 2 2 2 2 2 2 4 3 2 2 ...
 $ fin_friend : Factor w/ 4 levels "","9","No","Yes": 2 2 3 3 2 2 2 2 4 2 ...
 $ fin_oth    : Factor w/ 4 levels "","9","No","Yes": 2 2 2 2 2 2 2 2 4 2 ...

我希望能够根据列对数据集进行子集化，然后将其传递给函数。现在，这些值包含“是”、“否”、“999”（表示缺失）。

我的目标是能够说明，对于每一行，任何列是否包含“是”，那么新列将填充“是”。我确信有比下面的代码更简单的方法，所以我对此持开放态度。

我目前的代码：

trial <- df[, 23:29]
trial.test <- as.data.frame(trial)

composite_score <- function(x){
  # Convert to numeric values
  change_to_number <- function(j) {
    for (i in 1:length(j)){
      if(i == "Yes"){
        i <- 1
      }
      else{
        i <- 0
      }
    }
  }

  x <- change_to_number(x)  

  new_col_var <- function(k){
    if(rowSums(k) > 0){
      k$newvar <- 1
    }
    else {
      k$newvar <- 0
    }
  }

  x <- new_col_var(x)

}

composite_score(trial.test)

代码产生以下错误：

Error in rowSums(k) : 'x' must be an array of at least two dimensions

数据：

> dput(head(trial.test))
structure(list(fin_partner = structure(c(2L, 2L, 4L, 3L, 2L, 
2L), .Label = c("", "9", "No", "Yes"), class = "factor"), fin_parent = structure(c(2L, 
2L, 2L, 2L, 2L, 2L), .Label = c("", "9", "No", "Yes"), class = "factor"), 
    fin_kids = structure(c(4L, 2L, 2L, 2L, 2L, 2L), .Label = c("", 
    "9", "No", "Yes"), class = "factor"), fin_othkids = structure(c(2L, 
    2L, 2L, 2L, 2L, 2L), .Label = c("", "9", "No", "Yes"), class = "factor"), 
    fin_fam = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", 
    "9", "No", "Yes"), class = "factor"), fin_friend = structure(c(2L, 
    2L, 3L, 3L, 2L, 2L), .Label = c("", "9", "No", "Yes"), class = "factor"), 
    fin_oth = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", 
    "9", "No", "Yes"), class = "factor")), .Names = c("fin_partner", 
"fin_parent", "fin_kids", "fin_othkids", "fin_fam", "fin_friend", 
"fin_oth"), row.names = c(NA, 6L), class = "data.frame")

【问题讨论】：

试试rowSums(1:5) 与rowSums(matrix(1:5)) 还有你希望rowSums(k) > 0 做什么？你会有多个 TRUE/FALSE 而不仅仅是一个
您能添加一些示例数据供人们使用吗？
@rawr 我希望rowSums 对标志进行计数，如果总和不为0，则新列将为1
saply(df, MARGIN=1, FUN=function(row) ifelse(any("Yes" %in% row), "Yes", "No")) 之类的东西应该可以工作。如果您想要有效的答案，请提供数据！例如，发布dput(head(trial.test))的值。
这太棒了@antoine-sac。谢谢您的帮助。 @gung 我在编辑中添加了dput。

标签： r

【解决方案1】：

您的 change_to_number 函数严重损坏 - 它仅将 i 更改为 1 或 0，在输入中没有任何结果。您可以将其更改为：

change_to_number <- function(j){
        sapply(j, function(x) +(x=="yes"))
}

或者，将整体功能改为：

composite_score <- function(x){
    +(apply(x, 1, function(z) ("yes" %in% z)))
}

然后运行你的函数：

dat$newcol <- composite_score(dat)

解释：你想知道每一行是否有"yes"。要查看是否有，您可以为每一行运行以下命令：

"yes" %in% trial.test[1, ]
"yes" %in% trial.test[2, ]....

为此，您可以如下使用 apply - 我们在 z 中跨行（第 1 行）应用函数“yes”，并且每一行都作为 z 传递给函数：

tempdata <- apply(trial.test, 1, function(z) ("yes" %in% z))
tempdata

您应该为每一行获得一个TRUE 或FALSE。现在我们可以做一个小技巧，R 会将TRUE 转换为 1，将FALSE 转换为 0：

as.numeric(tempdata)
+(tempdata) #same, less typing

如果我们把它们放在一起，你就会得到你的新专栏：

+(apply(trial.test, 1, function(z) ("yes" %in% z)))

【讨论】：

谢谢@jeremycg。我使用第一个建议来清理那部分。

【解决方案2】：

感谢您发布数据，这样可以实际检查我写的内容！

# Loading your data
trial.test <- structure(list(fin_partner = [... redacted ...], class = "data.frame")

# computing the new variable
# the MARGIN=1 arg precises that we are working on the rows
# the applied function just looks for a "Yes" in the row
# and returns "Yes" if... yes, "No" otherwise.
myvar <- apply(trial.test, MARGIN=1, FUN=function(row) 
    ifelse(any("Yes" %in% row), "Yes", "No"))

# converting it to factor
myvar <- factor(myvar)

# putting it in trial.test just for illustration
cbind(trial.test, summary=myvar)

这给出了：

  fin_partner fin_parent fin_kids fin_othkids fin_fam fin_friend fin_oth summary
1           9          9      Yes           9       9          9       9     Yes
2           9          9        9           9       9          9       9      No
3         Yes          9        9           9       9         No       9     Yes
4          No          9        9           9       9         No       9      No
5           9          9        9           9       9          9       9      No
6           9          9        9           9       9          9       9      No

【讨论】：

【解决方案3】：

library(tidyr)
library(dplyr)
library(magrittr)

trial.test %<>% mutate(row_number = 1:n())

answer = 
  trial.test %>%
  gather(variable, value, -row_number) %>%
  filter(value == "Yes") %>%
  select(-variable) %>%
  distinct %>%
  right_join(trial.test)

【讨论】：