【问题标题】:How can I mutate all columns that match a string in R?如何改变与 R 中的字符串匹配的所有列?
【发布时间】:2019-10-21 11:25:14
【问题描述】:

我想重新编码我的数据框中列名中任何位置包含字符串“calcium”的所有列。因此,我尝试将 grepl 与 dplyr 中的 mutate 结合使用,但出现错误。

知道我做错了什么吗?我希望这是可能的!

我尝试过的代码如下使用dplyr,

#Make the dataframe
library(dplyr)
fake <-data.frame(id=c(1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3),              
              time=c(rep("Time1",9), rep("Time2",9)), 
              test=c("calcium","magnesium","zinc","calcium","magnesium","zinc","calcium","magnesium","zinc","calcium","magnesium","zinc","calcium","magnesium","zinc","calcium","magnesium","zinc"), 
              score=rnorm(18))
df <- dcast(fake, id ~ time + test)

#My attempt
df <- df %>% mutate(category=cut(df[,grepl("calcium", colnames(df))], breaks=c(-Inf, 1.2, 6, 12, Inf), labels=c(0,1,2,3)))
#Error:  'x' must be numeric

#My second attempt 
df <- df %>% mutate_at(vars(contains('calcium')), cut(breaks=c(-Inf, 1.2, 6, 12, Inf), labels=c(0,1,2,3)))
#Error: "argument "x" is missing, with no default"

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    这就是你所追求的吗?

    library(tidyverse)
    library(reshape2) # I added this for your dcast
    
    fake <-data.frame(id=c(1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3),              
                      time=c(rep("Time1",9), rep("Time2",9)), 
                      test=c("calcium","magnesium","zinc","calcium","magnesium","zinc", 
                             "calcium","magnesium","zinc","calcium","magnesium","zinc",
                             "calcium","magnesium","zinc","calcium","magnesium","zinc"), 
                      score=rnorm(18))
    df <- dcast(fake, id ~ time + test)
    df <- as_tibble(df) #added this
    
    #code
    df <- df %>% 
      mutate_at(vars(contains('calcium')), 
                ~cut(., 
                     breaks=c(-Inf, 1.2, 6, 12, Inf), 
                     labels=c(0, 1, 2, 3))) %>%
      mutate_at(vars(ends_with("_calcium")), funs(as.numeric)) 
    

    产生这个:

    # A tibble: 3 x 7
         id Time1_calcium Time1_magnesium Time1_zinc Time2_calcium Time2_magnesium
      <dbl>         <dbl>           <dbl>      <dbl>         <dbl>           <dbl>
    1     1             2          -0.256      0.303             1          0.144 
    2     2             2           2.18       0.417             1          0.0650
    3     3             1           0.863     -2.32              1          0.163 
    # ... with 1 more variable: Time2_zinc <dbl>
    

    基于此:https://suzan.rbind.io/2018/02/dplyr-tutorial-2/#mutate-at-to-change-specific-columns

    【讨论】:

    • 请问您如何将其作为新列添加到数据中而不是覆盖原始列?
    • 啊,你知道我怎样才能把它保持为数字而不是让它成为一个因素吗?我尝试将您的代码修改为 mutate_at(as.numeric(as.character...) 并尝试 ...as.numeric(as.character(~cut(...) 但我收到错误:(
    • 我将 df 转换为 tibble,然后添加了最后一位。它似乎不像 data.frame 那样工作,而是像 tibble 那样工作。不知道为什么。无论如何,我总是在 tibbles 中工作。它做你想做的事。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2020-05-02
    • 2020-10-14
    • 2019-07-21
    • 2013-05-29
    • 2022-01-02
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多