【问题标题】:Counting number of rows with values greater or equal to each value in a sequence计算值大于或等于序列中每个值的行数
【发布时间】:2021-12-05 17:34:49
【问题描述】:

我正在尝试计算包含大于或等于定义序列中的数字的值并分组在第二个变量上的行数。例如,在另一列中找到的按公司 A、公司 B 和公司 C 分组的值大于或等于 300、400、500 的行数。在 Excel 中,我只会使用 COUNTIFS 函数,但我不想通过 Excel 来完成这项任务。

示例数据集和预期结果:

library(tidyverse)

df <- tibble::tribble(
        ~Company, ~Sales,
             "B",   902L,
             "B",   575L,
             "C",   194L,
             "C",   215L,
             "A",   515L,
             "B",   728L,
             "A",   910L,
             "C",   889L,
             "A",   854L,
             "B",   230L,
             "C",   188L,
             "C",   442L,
             "A",   174L,
             "A",   723L,
             "B",   904L,
             "A",   761L,
             "B",   987L,
             "B",   521L,
             "B",   694L,
             "B",   530L,
             "C",   165L,
             "A",   507L,
             "B",   316L,
             "A",   452L,
             "A",   342L,
             "B",   413L,
             "B",   121L,
             "A",   650L,
             "B",   801L,
             "C",   100L
        )

result <- tibble::tribble(
             ~Company, ~Greater.or.equal.to, ~Count,
                  "A",                 300L,     9L,
                  "A",                 400L,     8L,
                  "A",                 500L,     7L,
                  "A",                 600L,     5L,
                  "A",                 700L,     4L,
                  "A",                 800L,     2L,
                  "A",                 900L,     1L,
                  "A",                1000L,     0L,
                  "B",                 300L,    11L,
                  "B",                 400L,    10L,
                  "B",                 500L,     9L,
                  "B",                 600L,     6L,
                  "B",                 700L,     5L,
                  "B",                 800L,     4L,
                  "B",                 900L,     3L,
                  "B",                1000L,     0L,
                  "C",                 300L,     2L,
                  "C",                 400L,     2L,
                  "C",                 500L,     1L,
                  "C",                 600L,     1L,
                  "C",                 700L,     1L,
                  "C",                 800L,     1L,
                  "C",                 900L,     0L,
                  "C",                1000L,     0L
             )

我知道如何使用 base R 或 dplyr(对 Tidyverse 更熟悉)来查找单个行,但还没有找到检查值序列的方法。我已经尝试创建一个 for 循环以希望得到正确的答案,但显然做错了什么。

# These two versions work but is inefficient for when we have a long sequence of variables to check against
length(which(df$Company == "A" & df$Sales >= 300))

df %>% 
  group_by(Company) %>% 
  summarise(count = sum(Sales >= 300))

# Attempt at a loop
# Sequence of values to loop over. The number sequence can change as the column we're checking
# against are changing
sequence <- seq(300, 1000, 100)
companies <- c("A", "B", "C")

counting <- function(data, col1, col2, range1, range2){
  for (i in range1){
    for (j in range2){
      length(which(data$col1 == i & data$col2 >= j))
    }
  }
}

counting(df, Company, Sales, companies, sequence)

非常感谢任何建议!

【问题讨论】:

    标签: r for-loop tidyverse


    【解决方案1】:

    我们可以循环 seq 从 300 到 1000 by 100, filter 使用循环值按“公司”分组后的数据,创建一个带有行数的 summarised 列 (@987654325 @),绑定 list 元素并使用 completefill 缺少的组合与 '0' 为 'Count' 列

    library(dplyr)
    library(purrr)
    library(tidyr)
    out <- map(seq(300, 1000, by = 100), ~ 
          df %>%
            group_by(Company) %>% 
            filter(Sales >= .x) %>% 
           summarise(Greater.or.equal.to = .x, Count = n())) %>%
         bind_rows %>%
         complete(Company, Greater.or.equal.to = seq(300, 1000,
                   by = 100), fill = list(Count = 0))
    

    -输出

    out
    # A tibble: 24 × 3
       Company Greater.or.equal.to Count
       <chr>                 <dbl> <dbl>
     1 A                       300     9
     2 A                       400     8
     3 A                       500     7
     4 A                       600     5
     5 A                       700     4
     6 A                       800     2
     7 A                       900     1
     8 A                      1000     0
     9 B                       300    11
    10 B                       400    10
    # … with 14 more rows
    > all.equal(out, result)
    [1] TRUE
    

    【讨论】:

      猜你喜欢
      • 2014-10-21
      • 1970-01-01
      • 2021-07-13
      • 1970-01-01
      • 2017-05-18
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-02-27
      相关资源
      最近更新 更多