【问题标题】:how can i add a column based on condition in R如何根据 R 中的条件添加列
【发布时间】:2014-10-10 02:10:19
【问题描述】:

我有这样的 df(data1)

 internode_length treatment genotype
1           98.165       sun       B3
2          116.633       sun       B3
3          103.474       sun       B3
4          120.683       sun       B3
5          109.128       sun       B3
6          129.076       sun       B3

我想根据条件向这个df添加一个单独的列

for i in (1:nrow(data1)){
  if (data1$genotype == "B3") {
      data1$mutation = "wt"
} else if (data1$genotype == "ein9" & "ein194"){
      data1$mutation = "phyB"
} else {
      data1$mutation = "hy2"
}
}

但是我收到了这个错误和警告,而且它也不起作用

Error: unexpected symbol in "for i"
>   if (data1$genotype == "B3") {
+       data1$mutation = "wt"
+ } else if (data1$genotype == "ein9"){
+       data1$mutation = "phyB"
+ } else {
+       data1$mutation = "hy2"
+ }
Warning message:
In if (data1$genotype == "B3") { :
  the condition has length > 1 and only the first element will be used
> }
Error: unexpected '}' in "}"

有解决此问题的建议吗?

【问题讨论】:

    标签: r loops if-statement


    【解决方案1】:

    你应该使用ifelse:

    transform(data1,
              mutation = ifelse (genotype == "B3",  "wt",
              ifelse (genotype %in% c("ein9","ein194"),
                      "phyB", "hy2")))
    
    #      internode_length treatment genotype mutation
    # 1           98.165       sun       B3       wt
    # 2          116.633       sun       B3       wt
    # 3          103.474       sun     ein9     phyB
    # 4          120.683       sun       B3       wt
    # 5          109.128       sun   ein194     phyB
    # 6          129.076       sun       A2      hy2
    

    【讨论】:

      【解决方案2】:

      data.table 替代方案。

      玩具数据
      #  internode_length treatment genotype
      #            98.165       sun       B3
      #           116.633       sun       B3
      #           103.474       sun       B3
      #           120.683       sun       B3
      #           109.128       sun       B3
      #           129.076       sun       B3
      #           129.076       sun     ein9
      #           129.076       sun   ein194
      #           129.076       sun       XY
      
      编码
      library(data.table)
      mydata[, new_col := ifelse(genotype == "B3", "wt",
                                 ifelse(genotype %in% c("ein9", "ein194"), "phyB",
                                        "hy2")
      )]
      mydata
      
      #    internode_length treatment genotype new_col
      # 1:           98.165       sun       B3      wt
      # 2:          116.633       sun       B3      wt
      # 3:          103.474       sun       B3      wt
      # 4:          120.683       sun       B3      wt
      # 5:          109.128       sun       B3      wt
      # 6:          129.076       sun       B3      wt
      # 7:          129.076       sun     ein9    phyB
      # 8:          129.076       sun   ein194    phyB
      # 9:          129.076       sun       XY     hy2
      

      【讨论】:

        【解决方案3】:

        您也可以不使用ifelse 来执行此操作

          v1 <- factor(df$genotype)
          v1
          #[1] B3     B3     ein9   B3     ein194 A2    
          #Levels: A2 B3 ein194 ein9
        

        将级别更改为您想要的级别。这里ein194ein9 应该是phyB

          levels(v1) <-  c("hy2", "wt", "phyB", "phyB")
          df$new_column <- as.character(v1)
           df
          #   internode_length treatment genotype new_column
          #1           98.165       sun       B3         wt
          #2          116.633       sun       B3         wt
          #3          103.474       sun     ein9       phyB
          #4          120.683       sun       B3         wt
          #5          109.128       sun   ein194       phyB
          #6          129.076       sun       A2        hy2
        

        数据

         df <- structure(list(internode_length = c(98.165, 116.633, 103.474, 
         120.683, 109.128, 129.076), treatment = c("sun", "sun", "sun", 
         "sun", "sun", "sun"), genotype = c("B3", "B3", "ein9", "B3", 
         "ein194", "A2")), .Names = c("internode_length", "treatment", 
        "genotype"), row.names = c("1", "2", "3", "4", "5", "6"), class = "data.frame")
        

        【讨论】:

        • 认为这对于具有大量级别和组合的更复杂的数据集进行编码会很棘手。
        • @KFB 那不也适用于ifelse 吗?如果您有多个级别/组合要更改,则需要多个 ifelse 语句。
        • 对。可能连接函数可以更好地处理更复杂的数据。
        • @KFB 是的,您可以使用key val 对创建一个新数据集并使用join/left_join/merge 等。
        猜你喜欢
        • 2023-01-04
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2016-02-03
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2021-02-23
        相关资源
        最近更新 更多