【问题标题】:How to use mutate function in iteration如何在迭代中使用 mutate 函数
【发布时间】:2020-08-21 06:58:22
【问题描述】:

我想在数据框中搜索主题列表。这里我附上示例数据集和代码。

Gene_and_Promoter <- tibble::tribble(
                     ~Gene,                                                ~Promoter,
                   "Gene1", "AGTCACGTGCGTGCATACGTGCAAATTGGGCGTACGTGGCTATCTCAACTATCH",
                   "Gene2",  "AACGTGGCGTGGCAGTGCACGTGCCAGTTGTCCCGCAGTGTGCATACTACTCT",
                   "Gene3",   "ACTGGCTACGTGCTGCAATGCGTGCGTAGTGCGTACCAAAGTTAAACCGGCG",
                   "Gene4",   "GCAATACGTGCAAGTGCGTGTACGTGCGTGATGTCGTACGTAACCGGCCGGT",
                   "Gene5",     "ATACGTGCGTCGTACGTGCGTACTAATACATACATCATAATTTAAACCCG",
                   "Gene6",          "GGGGGAATCTCGTTCCTACGTCAAGGATAGATGCTGATAGTCGTA"
                   )
Motifs <- tibble::tribble(
             ~MOTIF,
            "CGTGC",
           "GGAATA",
             "CCAG",
            "CGTA"
           )


 Gene_and_Promoter %>% 
  mutate(CGTGC = vcountPattern("CGTGC",DNAStringSet(Gene_and_Promoter$Promoter))) %>% 
  mutate(GGAATA = vcountPattern("GGAATA",DNAStringSet(Gene_and_Promoter$Promoter))) %>%
  mutate(CCAG = vcountPattern("CCAG",DNAStringSet(Gene_and_Promoter$Promoter))) %>% 
  mutate(CGTA = vcountPattern("CGTA",DNAStringSet(Gene_and_Promoter$Promoter)))

上述代码提供了所需的输出(Motif 在启动器中存在)。

我可以通过减少使用 mutate 的次数来优化上面的代码吗? (可能通过迭代)

【问题讨论】:

    标签: r tidyr


    【解决方案1】:

    这是一种类似于@det 的回答的可能性,但在 tidyverse 中......

    library(tidyverse)
    
    pat <- c("CGTGC", "GGAATA", "CCAG", "CGTA")
    
    # set names so that map_df() keeps them...
    lpat <- as.list(pat) %>%
      set_names(., pat)
    
    dd <-
      Gene_and_Promoter %>%
      mutate(across(Promoter, ~map_df(lpat, ~ vcountPattern(., DNAStringSet(Promoter))))) %>%
      as.list() %>%
      bind_cols() %>%
      full_join(Gene_and_Promoter, .)
    

    【讨论】:

      【解决方案2】:

      如果不深入了解函数DNAStringSet,很难说。也许尝试这样的事情:

      library(data.table)
      library(purrr)
      
      vec <- DNAStringSet(Gene_and_Promoter$Promoter)
      Motifs <- c("CGTGC", "GGAATA", "CCAG", "CGTA")
      
      setDT(Gene_and_Promoter)
      Gene_and_Promoter[, (Motifs) := map(Motifs, ~vcountPattern(.x, vec))]
      

      【讨论】:

        猜你喜欢
        • 2021-05-11
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2018-09-09
        • 2020-04-03
        • 1970-01-01
        相关资源
        最近更新 更多