【问题标题】:Summarise a column and thereby remove unwanted NAs in others汇总一列,从而删除其他列中不需要的 NA
【发布时间】:2020-12-23 09:47:43
【问题描述】:

我又一次陷入了困境并寻求帮助。我希望有一天能够回馈这个帮助......

无论如何,我有一个看起来像这样的小标题:

# A tibble: 20 x 6
# Groups:   tipologia [6]
   tipologia                                   date_info pct_day pct_month pct_year pct_no_date
   <chr>                                       <chr>       <dbl>     <dbl>    <dbl>       <dbl>
 1 Aree soggette a crolli/ribaltamenti diffusi day        0.0508  NA         NA         NA     
 2 Aree soggette a crolli/ribaltamenti diffusi month     NA        0.0217    NA         NA     
 3 Aree soggette a crolli/ribaltamenti diffusi no date   NA       NA         NA          0.227 
 4 Aree soggette a crolli/ribaltamenti diffusi year      NA       NA          0.701     NA     
 5 Aree soggette a frane superficiali diffuse  day        0.0721  NA         NA         NA     
 6 Aree soggette a frane superficiali diffuse  month     NA        0.0218    NA         NA     
 7 Aree soggette a frane superficiali diffuse  no date   NA       NA         NA          0.570 
 8 Aree soggette a frane superficiali diffuse  year      NA       NA          0.336     NA     
 9 Aree soggette a sprofondamenti diffusi      day        0.143   NA         NA         NA     
10 Aree soggette a sprofondamenti diffusi      no date   NA       NA         NA          0.286 
11 Aree soggette a sprofondamenti diffusi      year      NA       NA          0.571     NA     
12 Colamento lento                             day        0.119   NA         NA         NA     
13 Colamento lento                             month     NA        0.0475    NA         NA     
14 Colamento lento                             no date   NA       NA         NA          0.122 
15 Colamento lento                             year      NA       NA          0.712     NA     
16 Colamento rapido                            day        0.478   NA         NA         NA     
17 Colamento rapido                            month     NA        0.00838   NA         NA     
18 Colamento rapido                            no date   NA       NA         NA          0.0642
19 Colamento rapido                            year      NA       NA          0.450     NA     
20 Complesso                                   day        0.262   NA         NA         NA     

“tipologia”中有四个条目,因为有四种可能的日期信息(日、年、月或根本没有信息)。我想要的是每个tipologia只有一行,并且基本上删除了这些不必要的NA。 NA 不能有任何值,所以它们有点烦人。

我尝试了很多再次分组和总结,但没有达到我想做的事情。所以任何想法都会非常有帮助:)

【问题讨论】:

    标签: r group-by tibble summarize


    【解决方案1】:

    您可以使用na.omit 删除NA 值。

    library(dplyr)
    df %>%
      group_by(tipologia) %>%
      summarise(across(starts_with('pct'), na.omit))
    

    na.omit 应该适用于上述数据,但更安全的选择是:

    df %>%
      group_by(tipologia) %>%
      summarise(across(starts_with('pct'), ~.x[!is.na(x)][1]))
    

    【讨论】:

      【解决方案2】:

      您可以使用aggregate 并使用lapply 遍历列,然后使用merge

      Reduce(function(...) merge(..., all=T), lapply(names(dat)[3:6], function(x) 
        aggregate(as.formula(paste(x, "~ tipologia")), dat, I)))
      #                                     tipologia pct_day pct_month pct_year pct_no_date
      # 1 Aree soggette a crolli/ribaltamenti diffusi  0.0508   0.02170    0.701      0.2270
      # 2  Aree soggette a frane superficiali diffuse  0.0721   0.02180    0.336      0.5700
      # 3      Aree soggette a sprofondamenti diffusi  0.1430        NA    0.571      0.2860
      # 4                             Colamento lento  0.1190   0.04750    0.712      0.1220
      # 5                            Colamento rapido  0.4780   0.00838    0.450      0.0642
      # 6                                   Complesso  0.2620        NA       NA          NA
      

      数据:

      dat <- structure(list(tipologia = c("Aree soggette a crolli/ribaltamenti diffusi", 
      "Aree soggette a crolli/ribaltamenti diffusi", "Aree soggette a crolli/ribaltamenti diffusi", 
      "Aree soggette a crolli/ribaltamenti diffusi", "Aree soggette a frane superficiali diffuse", 
      "Aree soggette a frane superficiali diffuse", "Aree soggette a frane superficiali diffuse", 
      "Aree soggette a frane superficiali diffuse", "Aree soggette a sprofondamenti diffusi", 
      "Aree soggette a sprofondamenti diffusi", "Aree soggette a sprofondamenti diffusi", 
      "Colamento lento", "Colamento lento", "Colamento lento", "Colamento lento", 
      "Colamento rapido", "Colamento rapido", "Colamento rapido", "Colamento rapido", 
      "Complesso"), date_info = c("day", "month", "no date", "year", 
      "day", "month", "no date", "year", "day", "no date", "year", 
      "day", "month", "no date", "year", "day", "month", "no date", 
      "year", "day"), pct_day = c(0.0508, NA, NA, NA, 0.0721, NA, NA, 
      NA, 0.143, NA, NA, 0.119, NA, NA, NA, 0.478, NA, NA, NA, 0.262
      ), pct_month = c(NA, 0.0217, NA, NA, NA, 0.0218, NA, NA, NA, 
      NA, NA, NA, 0.0475, NA, NA, NA, 0.00838, NA, NA, NA), pct_year = c(NA, 
      NA, NA, 0.701, NA, NA, NA, 0.336, NA, NA, 0.571, NA, NA, NA, 
      0.712, NA, NA, NA, 0.45, NA), pct_no_date = c(NA, NA, 0.227, 
      NA, NA, NA, 0.57, NA, NA, 0.286, NA, NA, NA, 0.122, NA, NA, NA, 
      0.0642, NA, NA)), class = "data.frame", row.names = c("1", "2", 
      "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", 
      "15", "16", "17", "18", "19", "20"))
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2016-04-23
        • 2015-05-21
        • 2022-11-19
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多