【问题标题】:How do pass a tibble to a function and calculate conditional Sumifs如何将 tibble 传递给函数并计算条件 Sumifs
【发布时间】:2021-03-27 19:45:44
【问题描述】:

我正在尝试编写一个函数来计算棒球运动员的得分。我已经创建了拆分 Tibble、一个工作 Tibble 以及一个与 mutate 函数一起使用的函数,以将一个分数列添加到工作 Tibble,df。

该函数应该从工作的 Tibble 中获取输入,并根据相关的拆分计算分数(平均值之和)。我提供了以下代表。当我尝试执行我的功能时,我的分数为零。预期得分值遵循代表。

谁能告诉我我做错了什么?

library(tidyverse)

split <- tibble(player=c("Soto","Soto","Judge","Judge","Soto","Soto","Judge","Judge"),
                split=c("Grass","Turf","Grass","Turf","DAY","NIGHT", "DAY","NIGHT"),
                AVG=c(200,225,250,275,300,325,350,375))
df <- tibble(date = c("2021-03-24", "2021-03-27", "2021-03-21", "2021-03-25"), player=c("Soto","Soto", "Judge", "Judge"), TOD=c("DAY","DAY","DAY","NIGHT"), surface=c("Grass","Turf","Turf","Grass"))

split
#> # A tibble: 8 x 3
#>   player split   AVG
#>   <chr>  <chr> <dbl>
#> 1 Soto   Grass   200
#> 2 Soto   Turf    225
#> 3 Judge  Grass   250
#> 4 Judge  Turf    275
#> 5 Soto   DAY     300
#> 6 Soto   NIGHT   325
#> 7 Judge  DAY     350
#> 8 Judge  NIGHT   375

df
#> # A tibble: 4 x 4
#>   date       player TOD   surface
#>   <chr>      <chr>  <chr> <chr>  
#> 1 2021-03-24 Soto   DAY   Grass  
#> 2 2021-03-27 Soto   DAY   Turf   
#> 3 2021-03-21 Judge  DAY   Turf   
#> 4 2021-03-25 Judge  NIGHT Grass

getSplitScore <- function(df,player,surface, timeofDay){
 
  z <- sum(df[df$player==player & df$split==timeofDay,]$AVG)
  y <- sum(df[df$player==player & df$split==surface,]$AVG)
  
  return(z + y)
}


df <- df %>% 
  mutate(score=getSplitScore(split, player, surface, TOD))

df
#> # A tibble: 4 x 5
#>   date       player TOD   surface score
#>   <chr>      <chr>  <chr> <chr>   <int>
#> 1 2021-03-24 Soto   DAY   Grass       0
#> 2 2021-03-27 Soto   DAY   Turf        0
#> 3 2021-03-21 Judge  DAY   Turf        0
#> 4 2021-03-25 Judge  NIGHT Grass       0

reprex package (v0.3.0) 于 2021 年 3 月 27 日创建

我期待的是这样的:

#> # A tibble: 4 x 5
#>   date       player TOD   surface score
#>   <chr>      <chr>  <chr> <chr>   <int>
#> 1 2021-03-24 Soto   DAY   Grass     500
#> 2 2021-03-27 Soto   DAY   Turf      525
#> 3 2021-03-21 Judge  DAY   Turf      625
#> 4 2021-03-25 Judge  NIGHT Grass     625 

【问题讨论】:

  • split 标题中没有 BaVG 列。我也不确定您是如何得出分值的。
  • 在您想要的输出中,200 分是 Soto Day 和 Soto TOD 的总和?我只需要一个关于如何计算分数的示例。
  • 分数只是分割BaVG/的总和
  • @WilliamGram 不错。有时你可能离事物太近了。谢谢!

标签: r tidyverse


【解决方案1】:

在管道中执行 mutate 之前添加 group_by 将起作用。见

df %>% group_by(date, player) %>%
  mutate(score= getSplitScore(split, player, surface, TOD))

# A tibble: 4 x 5
# Groups:   date, player [4]
  date       player TOD   surface score
  <chr>      <chr>  <chr> <chr>   <dbl>
1 2021-03-24 Soto   DAY   Grass     500
2 2021-03-27 Soto   DAY   Turf      525
3 2021-03-21 Judge  DAY   Turf      625
4 2021-03-25 Judge  NIGHT Grass     625

提出了替代策略

df %>% left_join(split, by = c("player" = "player", "TOD" = "split")) %>%
  rbind(df %>% left_join(split, by = c("player" = "player", "surface" = "split"))) %>%
  group_by(date, player, TOD, surface) %>%
  summarise(AVG = sum(AVG))

# A tibble: 4 x 5
# Groups:   date, player, TOD [4]
  date       player TOD   surface   AVG
  <chr>      <chr>  <chr> <chr>   <dbl>
1 2021-03-21 Judge  DAY   Turf      625
2 2021-03-24 Soto   DAY   Grass     500
3 2021-03-25 Judge  NIGHT Grass     625
4 2021-03-27 Soto   DAY   Turf      525

【讨论】:

  • 感谢您的回复。我的函数应该引用拆分的 df。我不是要总结 df 中的列,而是尝试在 df 中的表面和一天中的时间与拆分匹配时从拆分中的 AVG 列计算分数。
  • 是的。知道了。请参阅我修改后的答案。只需在 dplyr 管道中的 mutate 之前添加一个 group_by 即可。
【解决方案2】:

我会先将您的 split$split 列拆分为 turf 和 day。一种不太优雅的方法:

split <- split %>% 
  mutate(
    TOD = ifelse(split %in% c('DAY', 'NIGHT'), split, 'NA'),
    surface = ifelse(split %in% c('Grass', 'Turf'), split, 'NA'),
    .keep = 'unused'
  )

那么你可以离开加入:

df <- df %>% 
  left_join(
    split %>% select(-surface) %>% rename(todVal = AVG), by = c('player', 'TOD')
  ) %>% 
  left_join(
    split %>% select(-TOD) %>% rename(surfaceVal = AVG), by = c('player', 'surface')
  ) %>% 
  mutate(score = (todVal + surfaceVal), .keep='unused')

你应该得到的输出:

df
#         date player   TOD surface score
# 1 2021-03-24   Soto   DAY   Grass 500.0
# 2 2021-03-27   Soto   DAY    Turf 525.0
# 3 2021-03-21  Judge   DAY    Turf 625.0
# 4 2021-03-25  Judge NIGHT   Grass 625.0

我确实看到输出不是您想要的,但也许您应该尝试更清楚地了解您要做什么。

【讨论】:

    猜你喜欢
    • 2021-05-21
    • 2018-11-16
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-05-16
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多