【发布时间】:2016-07-26 13:42:51
【问题描述】:
我正在尝试清理数据集并在名称下创建 3 个变量:Adventure、Action 和 Comedy。原始数据集有 3000 个观测值(导入文件名:dat)。我只展示了一些观察结果
id Runtime Genres
37 75 animation, adventure, family, fantasy, musical
1 162 action, adventure, fantasy, sci_fi
95 126 action, fantasy
100 101 comedy, drama, fantasy
82 136 action, adventure, sci-fi
99 117 animation, adventure, comedy, family, sport
91 95 animation, comedy, crime, family
在 R 中导入数据集后,使用以下 R 代码将所有流派分为 5:
dat1 <- dat %>% separate (Genres, c("Genres1","Genres2" ,"Genres3" ,"Genres4" ,"Genres5" ), sep=",", extra = "drop", fill = "right")
id Runtime Genres1 Genres2 Genres3 Genres4 Genres5
37 75 animation adventure family fantasy musical
1 162 action adventure fantasy sci_fi
95 126 action fantasy
100 101 comedy drama fantasy
82 136 action adventure sci-fi
99 117 animation adventure comedy family sport
91 95 animation comedy crime family
如何将动作、冒险和喜剧的所有类型归为 1 个类别?
我尝试使用以下代码:
为冒险创建了一个空列
dat1 ["adventure"] <- NA
dat1$adventure <- ifelse(dat1$Genres1=="adventure",1,(ifelse(dat1$Genres2=="adventure",1,0)))
建议将代码缩短为
dat1$adventure <- ifelse((dat1$Genres1=="adventure" | dat1$Genres2=="adventure" | dat1$Genres3=="adventure" | dat1$Genres4=="adventure" ),1, 0)
id Runtime Genres1 Genres2 Genres3 Genres4 Genres5 Adventure
37 75 animation adventure family fantasy musical 0
1 162 action adventure fantasy sci_fi 0
95 126 action fantasy 0
100 101 comedy drama fantasy 0
82 136 action adventure sci-fi 0
99 117 animation adventure comedy family sport 0
91 95 animation comedy crime family 0
代码能够为Genres1 提取冒险,但为Genres2 返回零。
我已经重新编辑了这个问题。我尝试了建议的事情,但不确定如何去做,因为有 3000 次观察。
运行建议后
流派列表,向量的形成并将其分配给 dat2
dat2 <- c( "adventure", "comedy", "action", "drama", "animation", "fantasy", "mystery", "family", "sci-fi", "thriller", "romance", "horror", "musical","history", "war", "documentary", "biography")
表格(因子(dat2))表格(因子(dat2))
action adventure animation biography comedy documentary drama
1 1 1 1 1 1 1
family fantasy history horror musical mystery romance
1 1 1 1 1 1 1
sci-fi thriller war
1 1 1
创建函数
fun1 <- function("adventure", "comedy", "action", "drama", "animation",
"fantasy", "mystery", "family", "sci-fi", "thriller", "romance", "horror",
"musical","history", "war", "documentary", "biography")) {
vector_of_cur_genres <- seperate(i, sep = ", ")
result <- table(factor(vector_of_cur_genres, dat2))
return(result)
}
# Results
fun1 <- function("adventure", "comedy", "action", "drama",
"animation", "fantasy", "mystery", "family", "sci-fi", "thriller",
"romance", "horror", "musical","history", "war", "documentary",
"biography")) {
Error: unexpected string constant in "fun1 <- function("adventure""
> vector_of_cur_genres <- separate(i, sep = ", ")
Error: Please supply column name
> result <- table(factor(vector_of_cur_genres, dat2))
Error in factor(vector_of_cur_genres, dat2) :
object 'vector_of_cur_genres' not found
> return(result)
Error: no function to return from, jumping to top level
> }
Error: unexpected '}' in "}"
mat <- mapply(fun1,dat2$Genres)
Error in match.fun(FUN) : object 'fun1' not found
【问题讨论】:
-
仅供参考,在分配之前无需创建一个空的新列:无论如何,分配都会创建它。
-
欢迎来到 Stack Overflow! How to make a great R reproducible example?
-
可能,将您的数据从宽转换为长,然后是表格摘要。
-
为简化起见,这可以简化为单个
ifelse函数:ifelse((dat1$Genres1=="adventure" | dat1$Genres2=="adventure"),1, 0)
标签: r