【发布时间】:2021-12-05 17:34:49
【问题描述】:
我正在尝试计算包含大于或等于定义序列中的数字的值并分组在第二个变量上的行数。例如,在另一列中找到的按公司 A、公司 B 和公司 C 分组的值大于或等于 300、400、500 的行数。在 Excel 中,我只会使用 COUNTIFS 函数,但我不想通过 Excel 来完成这项任务。
示例数据集和预期结果:
library(tidyverse)
df <- tibble::tribble(
~Company, ~Sales,
"B", 902L,
"B", 575L,
"C", 194L,
"C", 215L,
"A", 515L,
"B", 728L,
"A", 910L,
"C", 889L,
"A", 854L,
"B", 230L,
"C", 188L,
"C", 442L,
"A", 174L,
"A", 723L,
"B", 904L,
"A", 761L,
"B", 987L,
"B", 521L,
"B", 694L,
"B", 530L,
"C", 165L,
"A", 507L,
"B", 316L,
"A", 452L,
"A", 342L,
"B", 413L,
"B", 121L,
"A", 650L,
"B", 801L,
"C", 100L
)
result <- tibble::tribble(
~Company, ~Greater.or.equal.to, ~Count,
"A", 300L, 9L,
"A", 400L, 8L,
"A", 500L, 7L,
"A", 600L, 5L,
"A", 700L, 4L,
"A", 800L, 2L,
"A", 900L, 1L,
"A", 1000L, 0L,
"B", 300L, 11L,
"B", 400L, 10L,
"B", 500L, 9L,
"B", 600L, 6L,
"B", 700L, 5L,
"B", 800L, 4L,
"B", 900L, 3L,
"B", 1000L, 0L,
"C", 300L, 2L,
"C", 400L, 2L,
"C", 500L, 1L,
"C", 600L, 1L,
"C", 700L, 1L,
"C", 800L, 1L,
"C", 900L, 0L,
"C", 1000L, 0L
)
我知道如何使用 base R 或 dplyr(对 Tidyverse 更熟悉)来查找单个行,但还没有找到检查值序列的方法。我已经尝试创建一个 for 循环以希望得到正确的答案,但显然做错了什么。
# These two versions work but is inefficient for when we have a long sequence of variables to check against
length(which(df$Company == "A" & df$Sales >= 300))
df %>%
group_by(Company) %>%
summarise(count = sum(Sales >= 300))
# Attempt at a loop
# Sequence of values to loop over. The number sequence can change as the column we're checking
# against are changing
sequence <- seq(300, 1000, 100)
companies <- c("A", "B", "C")
counting <- function(data, col1, col2, range1, range2){
for (i in range1){
for (j in range2){
length(which(data$col1 == i & data$col2 >= j))
}
}
}
counting(df, Company, Sales, companies, sequence)
非常感谢任何建议!
【问题讨论】: