如何根据 R 中的条件添加列答案

【问题标题】：how can i add a column based on condition in R如何根据 R 中的条件添加列
【发布时间】：2014-10-10 02:10:19
【问题描述】：

我有这样的 df(data1)

 internode_length treatment genotype
1           98.165       sun       B3
2          116.633       sun       B3
3          103.474       sun       B3
4          120.683       sun       B3
5          109.128       sun       B3
6          129.076       sun       B3

我想根据条件向这个df添加一个单独的列

for i in (1:nrow(data1)){
  if (data1$genotype == "B3") {
      data1$mutation = "wt"
} else if (data1$genotype == "ein9" & "ein194"){
      data1$mutation = "phyB"
} else {
      data1$mutation = "hy2"
}
}

但是我收到了这个错误和警告，而且它也不起作用

Error: unexpected symbol in "for i"
>   if (data1$genotype == "B3") {
+       data1$mutation = "wt"
+ } else if (data1$genotype == "ein9"){
+       data1$mutation = "phyB"
+ } else {
+       data1$mutation = "hy2"
+ }
Warning message:
In if (data1$genotype == "B3") { :
  the condition has length > 1 and only the first element will be used
> }
Error: unexpected '}' in "}"

有解决此问题的建议吗？

【问题讨论】：

标签： r loops if-statement

【解决方案1】：

你应该使用ifelse:

transform(data1,
          mutation = ifelse (genotype == "B3",  "wt",
          ifelse (genotype %in% c("ein9","ein194"),
                  "phyB", "hy2")))

#      internode_length treatment genotype mutation
# 1           98.165       sun       B3       wt
# 2          116.633       sun       B3       wt
# 3          103.474       sun     ein9     phyB
# 4          120.683       sun       B3       wt
# 5          109.128       sun   ein194     phyB
# 6          129.076       sun       A2      hy2

【讨论】：

【解决方案2】：

data.table 替代方案。

玩具数据

#  internode_length treatment genotype
#            98.165       sun       B3
#           116.633       sun       B3
#           103.474       sun       B3
#           120.683       sun       B3
#           109.128       sun       B3
#           129.076       sun       B3
#           129.076       sun     ein9
#           129.076       sun   ein194
#           129.076       sun       XY

编码

library(data.table)
mydata[, new_col := ifelse(genotype == "B3", "wt",
                           ifelse(genotype %in% c("ein9", "ein194"), "phyB",
                                  "hy2")
)]
mydata

#    internode_length treatment genotype new_col
# 1:           98.165       sun       B3      wt
# 2:          116.633       sun       B3      wt
# 3:          103.474       sun       B3      wt
# 4:          120.683       sun       B3      wt
# 5:          109.128       sun       B3      wt
# 6:          129.076       sun       B3      wt
# 7:          129.076       sun     ein9    phyB
# 8:          129.076       sun   ein194    phyB
# 9:          129.076       sun       XY     hy2

【讨论】：

【解决方案3】：

您也可以不使用ifelse 来执行此操作

  v1 <- factor(df$genotype)
  v1
  #[1] B3     B3     ein9   B3     ein194 A2    
  #Levels: A2 B3 ein194 ein9

将级别更改为您想要的级别。这里ein194 和ein9 应该是phyB。

  levels(v1) <-  c("hy2", "wt", "phyB", "phyB")
  df$new_column <- as.character(v1)
   df
  #   internode_length treatment genotype new_column
  #1           98.165       sun       B3         wt
  #2          116.633       sun       B3         wt
  #3          103.474       sun     ein9       phyB
  #4          120.683       sun       B3         wt
  #5          109.128       sun   ein194       phyB
  #6          129.076       sun       A2        hy2

数据

 df <- structure(list(internode_length = c(98.165, 116.633, 103.474, 
 120.683, 109.128, 129.076), treatment = c("sun", "sun", "sun", 
 "sun", "sun", "sun"), genotype = c("B3", "B3", "ein9", "B3", 
 "ein194", "A2")), .Names = c("internode_length", "treatment", 
"genotype"), row.names = c("1", "2", "3", "4", "5", "6"), class = "data.frame")

【讨论】：

认为这对于具有大量级别和组合的更复杂的数据集进行编码会很棘手。
@KFB 那不也适用于ifelse 吗？如果您有多个级别/组合要更改，则需要多个 ifelse 语句。
对。可能连接函数可以更好地处理更复杂的数据。
@KFB 是的，您可以使用key val 对创建一个新数据集并使用join/left_join/merge 等。