如果没有更多上下文,很难提供完全符合要求的代码。
根据您的问题,我想到了两个选项。
选项 1。您希望 AB 的排名与 B 相同(例如)。
选项 2。您希望 AB 与 B 的排名不同(例如)。
选项 1 显然存在问题,因为您使用的最后一行将基于
它在原始数据集中出现的顺序。如果代码列选项 2 可能会更好
表示错误。例如,如果 ID 为 418 的系统有错误代码 A 和 B,
这比错误代码 B 更糟糕。
library(dplyr)
df_have <- data.frame(ID = c(418, 418, 418),
Date = c("1/01/2020", "1/01/2020","1/01/2020"),
Priority = c(1, 1, 1),
Revenue = c(-866, -866, -866),
Code = c("A", "AB", "A"),
V1 = c("XX3", "XX2", "XX3"),
V2 = c("XX1", "XX2", "XX1"),
V3 = c("XX3", "XX3", "XX3"))
# Option 1. Rank AB the same as B (for example)
df_want.1 <- df_have %>%
# add a numeric score based on the B > A > C ordering
mutate(score = case_when(
grepl("B", Code) ~ 3,
grepl("A", Code) ~ 2,
grepl("C", Code) ~ 1,
)) %>%
# group by Date, Priority, ID, Revenue (since you want the row with the highest code)
group_by(Date, Priority, ID, Revenue) %>%
# only keep the row for the group which has the highest score (or highest code)
filter(score == max(score)) %>%
# AB and B will both produce a score of 3, so we only keep one of the rows in the group
distinct(Date, Priority, ID, Revenue, .keep_all = TRUE) %>%
ungroup()
df_want.1
# Option 2. Rank AB above B (for example)
df_want.2 <- df_have %>%
# add a numeric score based on the B > A > C ordering
mutate(score_b = if_else(grepl("B", Code), 3, 0),
score_a = if_else(grepl("A", Code), 2, 0),
score_c = if_else(grepl("C", Code), 1, 0)) %>%
# group by Date, Priority, ID, Revenue (since you want the row with the highest code)
group_by(Date, Priority, ID, Revenue) %>%
# add each of the scores together
mutate(row_score = score_b + score_a + score_c) %>%
# only keep the row for the group which has the highest score (or highest code combination)
filter(row_score == max(row_score)) %>%
# assuming it's possible to have the same score across the group, only keep first row in the group
distinct(Date, Priority, ID, Revenue, .keep_all = TRUE) %>%
ungroup()
df_want.2