【问题标题】：Creating a variable using multiple variable使用多个变量创建变量
【发布时间】：2016-07-26 13:42:51
【问题描述】：

我正在尝试清理数据集并在名称下创建 3 个变量：Adventure、Action 和 Comedy。原始数据集有 3000 个观测值（导入文件名：dat）。我只展示了一些观察结果

id    Runtime        Genres                                       
37      75       animation, adventure, family, fantasy, musical   
1       162      action, adventure, fantasy, sci_fi       
95      126      action, fantasy   
100     101      comedy, drama, fantasy   
82      136      action, adventure, sci-fi    
99      117      animation, adventure, comedy, family, sport   
91      95       animation, comedy, crime, family

在 R 中导入数据集后，使用以下 R 代码将所有流派分为 5：

dat1 <- dat %>% separate (Genres, c("Genres1","Genres2" ,"Genres3" ,"Genres4" ,"Genres5" ), sep=",", extra = "drop", fill = "right")


id    Runtime    Genres1    Genres2    Genres3  Genres4  Genres5                                       
37      75       animation  adventure  family   fantasy  musical   
1       162      action     adventure  fantasy  sci_fi       
95      126      action     fantasy   
100     101      comedy     drama      fantasy   
82      136      action     adventure  sci-fi    
99      117      animation  adventure  comedy   family   sport   
91      95       animation  comedy     crime    family

如何将动作、冒险和喜剧的所有类型归为 1 个类别？

我尝试使用以下代码：

为冒险创建了一个空列

dat1 ["adventure"] <- NA

dat1$adventure <- ifelse(dat1$Genres1=="adventure",1,(ifelse(dat1$Genres2=="adventure",1,0)))

建议将代码缩短为

  dat1$adventure <- ifelse((dat1$Genres1=="adventure" | dat1$Genres2=="adventure" | dat1$Genres3=="adventure" | dat1$Genres4=="adventure" ),1, 0)


id    Runtime    Genres1    Genres2    Genres3  Genres4  Genres5  Adventure                                     
37      75       animation  adventure  family   fantasy  musical  0
1       162      action     adventure  fantasy  sci_fi            0
95      126      action     fantasy                               0
100     101      comedy     drama      fantasy                    0
82      136      action     adventure  sci-fi                     0
99      117      animation  adventure  comedy   family   sport    0   
91      95       animation  comedy     crime    family            0

代码能够为Genres1 提取冒险，但为Genres2 返回零。

我已经重新编辑了这个问题。我尝试了建议的事情，但不确定如何去做，因为有 3000 次观察。

运行建议后

流派列表，向量的形成并将其分配给 dat2

dat2 <- c( "adventure", "comedy", "action", "drama", "animation", "fantasy", "mystery", "family", "sci-fi", "thriller", "romance", "horror", "musical","history", "war", "documentary", "biography")

表格（因子（dat2））表格（因子（dat2））

 action   adventure   animation   biography      comedy documentary          drama 
      1           1           1           1           1           1           1 
 family     fantasy     history      horror     musical     mystery     romance 
      1           1           1           1           1           1           1 
 sci-fi    thriller         war 
      1           1           1

创建函数

 fun1 <- function("adventure", "comedy", "action", "drama", "animation",
"fantasy", "mystery", "family", "sci-fi", "thriller", "romance", "horror", 
"musical","history", "war", "documentary", "biography")) {
 vector_of_cur_genres <- seperate(i, sep = ", ")
 result <- table(factor(vector_of_cur_genres, dat2))
 return(result)
 }  

  # Results         

 fun1 <- function("adventure", "comedy", "action", "drama", 
 "animation", "fantasy", "mystery", "family", "sci-fi", "thriller",  
 "romance", "horror", "musical","history", "war", "documentary", 
 "biography")) {
  Error: unexpected string constant in "fun1 <- function("adventure""
  >   vector_of_cur_genres <- separate(i, sep = ", ")
  Error: Please supply column name
  >   result <- table(factor(vector_of_cur_genres, dat2))
  Error in factor(vector_of_cur_genres, dat2) : 
  object 'vector_of_cur_genres' not found
  >   return(result)
  Error: no function to return from, jumping to top level
   > }
   Error: unexpected '}' in "}"

  mat <- mapply(fun1,dat2$Genres)
       Error in match.fun(FUN) : object 'fun1' not found

【问题讨论】：

仅供参考，在分配之前无需创建一个空的新列：无论如何，分配都会创建它。
欢迎来到 Stack Overflow！ How to make a great R reproducible example?
可能，将您的数据从宽转换为长，然后是表格摘要。
另见：Split comma-separated column into separate rows
为简化起见，这可以简化为单个ifelse 函数：ifelse((dat1$Genres1=="adventure" | dat1$Genres2=="adventure"),1, 0)

标签： r

【解决方案1】：

您可以混合使用表格和因子来获得您想要的结果。首先，您要确保所有类型的拼写每次都完全相同 ("Adventure" != "adventure")。然后你应该创建一个包含所有可能类型的向量c("Adventure", "Comedy", "Drama", ...")。

对于每一行，然后调用table(factor(genres, list_of_possible_genres))，它将返回一个计数表。然后你可以用这样的东西构造一个矩阵

mat <- mapply(
    function(i) {
        table(factor(separate(i, ...),list_of_possible_genres))
    },df$Genres)
#you want to use the original Data.Frame after import

new.df <- cbind(df,mat) #they should both have the same number of rows here

使单独调用中的... 与原始函数中的相同。如果您对各个功能或步骤的作用有任何疑问，我可以在 cmets 中解释。

我在 mapply 调用 function (i) ... 中定义了一个函数，这类似于在 Python 中定义一个 lambda。该函数接收一个流派字符串，并返回一个命名向量，该向量表示每种可能的流派出现了多少次。

编辑：

fun1 <- function(string_of_genres)) {
    vector_of_cur_genres <- seperate(i, sep = ", ")
    result <- table(factor(vector_of_cur_genres, list_of_possible_genres))
    return(result)
}
mat <- mapply(fun1,df$Genres)

【讨论】：

@Adam：我是 R 的初学者。你想为这个步骤处理原始导入的数据框吗？能解释一下矩阵函数和cbind吗？
cbind 是最简单的。它的作用是获取一堆矩阵或 data.frames 并将列彼此附加。所以调用cbind(df,mat) 会发生什么，data.frame 将矩阵的列标记到末尾。 mapply 是一个向量化函数，这意味着它需要一个向量、矩阵或列表，然后将给定的函数应用于它，然后返回每个函数调用的结果。 mapply 是 *apply 系列函数的一部分，有很多关于解释它们的细微差别和差异的文献。
检查我的编辑。您确实想从第一步开始在 data.frame 上调用它。在拆分数据之前。如果您看一下，您将在其他地方将代码中的数据拆分
这就像一个lambda 电话。 mapply 的参数之一应该是一个函数，但我没有编写、命名和传递该函数，而是跳过命名步骤并在 mapply 调用中编写函数。 function (i) 和 list_of_genres 之后的逗号之前的所有内容都是定义为由 mapply 使用的新函数。该函数将在列表中作为第二个参数传递给它的每个元素上调用。
查看我的编辑分离函数命名。希望这更有意义