【问题标题】:R: Summarise and returm the number of unique occurrence in Column B while group_by Column AR:汇总并返回B列中唯一出现的次数,而group_by A列
【发布时间】:2020-06-06 15:22:52
【问题描述】:

我有以下数据,dfs_alltasks:

    by_hour task
1   0       Apple Receiving
2   0       Apple Receiving
3   0       Orange Receiving
4   0       Banana Receiving
5   0       Banana Receiving
6   0       Orange Receiving
7   1       Orange Receiving
8   1       Banana Receiving
9   1       Banana Receiving
10  1       Banana Receiving
11  1       Banana Receiving
12  1       Banana Receiving
13  1       Orange Receiving
14  2       Banana Receiving
15  3       Banana Receiving

我喜欢按“by_hour”列分组,同时汇总并返回编号。小组中发生的任务,我应该得到这样的东西:

    by_hour task              count
1   0       Apple Receiving   2
2   0       Orange Receiving  2
3   0       Banana Receiving  2
4   1       Orange Receiving  2
5   1       Banana Receiving  5
6   2       Banana Receiving  1
7   3       Banana Receiving  1

我尝试过: dfs_alltasks %>% group_by(by_hour) %>% summarise_all(no_rows = length(task))

但我收到“list2(...) 中的错误:找不到对象‘任务’”的错误

【问题讨论】:

  • 看起来你只是想要dplyr::count(dfs_alltasks, by_hour, task)
  • 由于您想同时按 'by_hour' 和 'task' 进行分组,因此您需要将两者都包含在 group_by 参数中。也不需要summarise_allsummarise 将完成这项工作,而不是 length(task),使用 n() 来计算每个段中的行数。

标签: r


【解决方案1】:

你不需要分组

library(tidyverse)

df_example <-
  structure(list(
    by_hour = c(0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1,
                1, 2, 3),
    task = c(
      "Apple Remaining",
      "Apple Remaining",
      "Orange Remaining",
      "Banana Remaining",
      "Banana Remaining",
      "Orange Remaining",
      "Orange Remaining",
      "Banana Remaining",
      "Banana Remaining",
      "Banana Remaining",
      "Banana Remaining",
      "Banana Remaining",
      "Orange Remaining",
      "Banana Remaining",
      "Banana Remaining"
    )
  ),
  class = "data.frame",
  row.names = c(NA, -15L))

df_example %>% 
  count(by_hour,task)
#>   by_hour             task n
#> 1       0  Apple Remaining 2
#> 2       0 Banana Remaining 2
#> 3       0 Orange Remaining 2
#> 4       1 Banana Remaining 5
#> 5       1 Orange Remaining 2
#> 6       2 Banana Remaining 1
#> 7       3 Banana Remaining 1

reprex package (v0.3.0) 于 2020 年 6 月 6 日创建

【讨论】:

    【解决方案2】:

    试试这个:

    library(tibble)
    library(dplyr)
    data <- tibble::tribble(
       ~by_hour, ~task,
      0 ,      "Apple Receiving",  
      0 ,      "Apple Receiving", 
      0 ,      "Orange Receiving",
      0 ,      "Banana Receiving",
      0 ,      "Banana Receiving",
      0 ,      "Orange Receiving",
      1 ,      "Orange Receiving",
      1 ,      "Banana Receiving",
      1 ,      "Banana Receiving",
      1 ,      "Banana Receiving",
      1 ,      "Banana Receiving",
      1 ,      "Banana Receiving",
      1 ,      "Orange Receiving",
      2 ,      "Banana Receiving",
      3 ,      "Banana Receiving")
    data %>% group_by(by_hour,task) %>% summarize(count=n()) %>% ungroup()
    

    【讨论】:

      【解决方案3】:

      请考虑使用dput()提供您的数据样本

      df <- structure(list(by_hour = c(0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 
      1, 2, 3), task = c("Apple Remaining", "Apple Remaining", "Orange Remaining", 
      "Banana Remaining", "Banana Remaining", "Orange Remaining", "Orange Remaining", 
      "Banana Remaining", "Banana Remaining", "Banana Remaining", "Banana Remaining", 
      "Banana Remaining", "Orange Remaining", "Banana Remaining", "Banana Remaining"
      )), class = "data.frame", row.names = c(NA, -15L))
      

      您可以将dplyr 包和group_by 用于您的变量。

      library(dplyr)
      df %>% 
        group_by(by_hour, task) %>% 
        count %>% 
        ungroup
      

      结果

        by_hour task       n
          <dbl> <chr>  <int>
      1       0 Apple      2
      2       0 Banana     2
      3       0 Orange     2
      4       1 Banana     5
      5       1 Orange     2
      6       2 Banana     1
      7       3 Banana     1
      

      【讨论】:

        【解决方案4】:

        我们也可以使用

        library(data.table)
        setDT(df)[, .(n = .N), .(by_hour, task)]
        

        【讨论】:

          猜你喜欢
          • 2021-11-14
          • 2022-01-12
          • 2018-07-06
          • 1970-01-01
          • 1970-01-01
          • 2020-08-19
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多