【问题标题】:Filter similar rows in R by values in column and paste result [duplicate]按列中的值过滤R中的相似行并粘贴结果[重复]
【发布时间】:2020-10-17 15:49:58
【问题描述】:

我正在使用 R 中的数据转换,而我无法过滤具有相似值的行,选择具有更高“表达式值”的行,然后按表达式级别拆分列中的数据并聚合它们.由于我知道解释不会给诺贝尔奖,下面是原始数据,结果以及我到目前为止所取得的成就。

原始数据

df <- read.table(text = 
           "Tissue          Species   Expression  
1           dentritic       Human     moderate
2           liver           Human     high
3           liver           Human     moderate
4           liver           Human     moderate
5           liver           Human     high
6           liver           Monkey    high
7           liver           Monkey    moderate
8           liver           Dog       high
9           liver           Dog       high
10          liver           Minipig   moderate
11          liver           Rat       low
12          liver           Rat       cutoff
13          liver           Monkey    moderate
14          lung            Monkey    high
15          quadriceps     Monkey     cutoff"  , header = TRUE)

我需要达到的结果是,如果 Tissue 和 Species 的值都重复,则只选择 Expression 上的最大值。

    Tissue           High_Expression        Moderate_Expression    Low_Expression    cutoff

1   dentritic                               Human
2   liver            Human, Monkey,Dog      Minipig                Rat
3   lung             Monkey
4   quadriceps                                                                       Monkey                

到目前为止我所拥有的:

df$Expression <- factor(df$Expression, levels = c("cutoff", "low", "moderate", "high"), ordered = TRUE)
df$Species <- as.character(df$Species)

df <- df %>% 
  mutate(High_expressed = ifelse(Expression == "high", Species, "")) %>% 
  mutate(moderate_expressed = ifelse(Expression == "moderate", Species, "")) %>% 
  mutate(low_expressed = ifelse(Expression == "low", Species, "")) %>% 
  mutate(below_cutoff_expressed = ifelse(Expression == "cutoff", Species, "")) %>% 
  select(-c("Expression", "Species"))

df <- aggregate(. ~ groupTissue, data = df, paste, collapse = ",")


That gives:

    Tissue           High_Expression        Moderate_Expression      Low_Expression    cutoff

1   dentritic                               Human
2   liver            Human,,,Human,         ,Human,Human,,,           ,,,,,,,,,Rat,,    ,,,,,,,,,Rat,
                     Monkey,,Dog,Dog,,,,    Monkey,,,Minipig,,,Monkey 
3   lung             Monkey
4   quadriceps                                                                          Monkey     

提前致谢

【问题讨论】:

    标签: r dplyr


    【解决方案1】:

    您可以先根据Expression的值排列数据,只选择TissueSpecies中较大的值,得到宽格式数据。

    library(dplyr)
    
    df %>%
      arrange(match(Expression, c('high', 'moderate', 'low', 'cutoff'))) %>%
      distinct(Tissue, Species, .keep_all = TRUE) %>%
      pivot_wider(names_from = Expression,values_from = Species,values_fn = toString) %>%
      arrange(Tissue)
    
    #  Tissue     high               moderate low   cutoff
    #  <chr>      <chr>              <chr>    <chr> <chr> 
    #1 dentritic  NA                 Human    NA    NA    
    #2 liver      Human, Monkey, Dog Minipig  Rat   NA    
    #3 lung       Monkey             NA       NA    NA    
    #4 quadriceps NA                 NA       NA    Monkey
    

    【讨论】:

    • 问题以一种非常好的和快速的方式解决了。非常感谢
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-12-05
    • 2016-11-27
    • 2015-07-06
    相关资源
    最近更新 更多