【问题标题】:How can I summarise a factor or character variable? [duplicate]如何总结因子或字符变量? [复制]
【发布时间】:2017-12-01 21:10:40
【问题描述】:

我想在 R 中“总结”一个因子变量,以便对于每条记录我都知道存在哪些因子水平。

这是一个简化的示例数据框:

df <- data.frame(record= c("a","a","b","c","c","c"),
species = c("COD", "SCE", "COD", "COD","SCE","QSC"))

record species
     a     COD
     a     SCE
     b     COD
     c     COD
     c     SCE
     c     QSC

这就是我想要实现的目标:

data.frame(record= c(a,b,c), species = c("COD, SCE", "COD", "COD, SCE, QSC"))

    record       species
        a       COD, SCE
        b            COD
        c  COD, SCE, QSC

这是我能得到的最接近的结果,但它会将所有级别的因素放在每条记录中,而不仅仅是每条记录应该存在的因素。

summarise(group_by(df, record),
          species = (paste(levels(species), collapse="")))
record   species
   <fctr>   <chr>
      a CODQSCSCE      <- this should be CODSCE
      b CODQSCSCE      <- this should just be COD
      c CODQSCSCE      <- this is correct as CODQSCSCE as it has all levels

tapply 返回同样的问题

tapply(df$species, df$record, function(x) paste(levels(x), collapse=""))
   a           b           c 
"CODQSCSCE" "CODQSCSCE" "CODQSCSCE" 

我需要找到一种方法来判断每条记录中存在哪些物种因素组合。

【问题讨论】:

  • 如果有另一行再次具有“COD”的站点,应该有什么解决方案? COD 应该只列出一次还是两次?

标签: r dataframe dplyr tapply


【解决方案1】:

使用unique():

library(dplyr)
df %>% 
    group_by(site) %>% 
    summarise(species = paste(unique(species), collapse = ', '))


# A tibble: 3 x 2
    site       species
  <fctr>         <chr>
1      a      COD, SCE
2      b           COD
3      c COD, SCE, QSC

【讨论】:

    猜你喜欢
    • 2019-10-07
    • 2023-03-29
    • 1970-01-01
    • 2016-07-06
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-08-23
    相关资源
    最近更新 更多