【问题标题】:Find latest occurrence of value in multiple columns and return value in another column in R在多列中查找最新出现的值并在 R 的另一列中返回值
【发布时间】:2022-10-06 00:12:52
【问题描述】:

我有一个像这样的数据框:

home_team <- c(\'Team A\', \'Team B\', \'Team C\', \'Team D\', \'Team B\', \'Team F\')
away_team <- c(\'Team B\', \'Team C\', \'Team D\', \'Team A\', \'Team F\', \'Team A\')
home_team_score_pre <- c(300, 150, 600, 800, 50, 450)
away_team_score_pre <- c(550, 340, 100, 208, 412, 18)
winning_team <- c(\'Team A\', \'Team C\', \'Team C\', \'Team D\', \'Team F\', \'Team F\')
res <- c(16, 25, 11, 4, 22, 9) 
home_team_score_change <- c(16, -25, 11, 4, -22, 9) 
away_team_score_change <- c(-16, 25, -11, -4, 22, -9)
home_team_score_post <- c(316, 125, 611, 804, 28, 459)
away_team_score_post <- c(534, 365, 89, 204, 434, 9)

输出:

  home_team away_team home_team_score_pre away_team_score_pre winning_team res home_team_score_change away_team_score_change
1    Team A    Team B                 300                 550       Team A  16                     16                    -16
2    Team B    Team C                 150                 340       Team C  25                    -25                     25
3    Team C    Team D                 600                 100       Team C  11                     11                    -11
4    Team D    Team A                 800                 208       Team D   4                      4                     -4
5    Team B    Team F                  50                 412       Team F  22                    -22                     22
6    Team F    Team A                 450                  18       Team F   9                      9                     -9
  home_team_score_post away_team_score_post
1                  316                  534
2                  125                  365
3                  611                   89
4                  804                  204
5                   28                  434
6                  459                    9

每支球队在比赛开始前都有一个分数(home_team_score_preaway_team_score_pre)。

比赛结束后,调整后的分数是结果 (res),这取决于他们是赢还是输了比赛。例如。第 1 行主队是 Team A 获胜,res 是 16,所以 Team A 的得分增加了 16,而 Team B 输了,他们的得分减去了 16。总体结果是赛后得分(home_team_score_postaway_team_score_post)。

我想要做的是更新home_team_score_preaway_team_score_pre,方法是查找每支球队上一场比赛,然后在home_team_score_postaway_team_score_post 上输入值)。

因此,例如,如果接下来的两行是:

  home_team away_team home_team_score_pre away_team_score_pre
1    Team C    Team B  
2    Team A    Team F  

然后我想查找\'Team C\'(第 3 行)进行的最后一场比赛,并将home_score_post (611) 输入home_team_score_pre column

同样,对于Team B(第 5 行)玩的最后一场游戏,将home_score_post (28) 输入home_team_score_pre column

这些值可以来自任一列,因为它是球队最后一场比赛的值,所以他们可能是主队或客队。

另外,如果球队正在玩它的第一场比赛(因此没有以前的值),我想输入一个默认值 100。

因此,最终输出将是:

home_team away_team home_team_score_pre away_team_score_pre winning_team res home_team_score_change away_team_score_change
1    Team A    Team B                 100                100       Team A  16                     16                    -16
2    Team B    Team C                 84                 100       Team C  25                    -25                     25
3    Team C    Team D                 125                100       Team C  11                     11                    -11
4    Team D    Team A                 89                 116       Team D   4                      4                     -4
5    Team B    Team F                 59                 100       Team F  22                    -22                     22
6    Team F    Team A                 122                112       Team F   9                      9                     -9
7    Team C    Team B                 136                37        Team B  12                      49                      131
8    Team B    Team F                 49                 131       Team F  10                      0                      0
  home_team_score_post away_team_score_post
1                  116                  84
2                  59                   125
3                  136                  89
4                  93                   112
5                  37                   122
6                  131                  103
7                  124                  49
8                  39                   141
  • 你想要的最终输出是什么?将其包含在您的问题中
  • 此外,您的问题主题谈到查找以前的值,但您表达的方式似乎找到了最新的值而不是以前的值。哪个是哪个?
  • 抱歉,我添加了最终输出作为示例。是的,最新值是我正在寻找的。感谢您的澄清。

标签: r


【解决方案1】:

更新 - 2022-10-05

如果您有如下所示的起始数据 (df):

  home_team away_team winning_team res
1    Team A    Team B       Team A  16
2    Team B    Team C       Team C  25
3    Team C    Team D       Team C  11
4    Team D    Team A       Team D   4
5    Team B    Team F       Team F  22
6    Team F    Team A       Team F   9

您可以使用以下方法对其进行转换

  1. 加载库,并添加一列表示游戏编号
    library(data.table)
    setDT(df)
    df[, game:=.I]
    
    1. 融长,生成这是否是球队的第一场比赛以及是否获胜的指标列,并设置初始值(即第一场比赛为100),并为第一场比赛发布得分游戏
    df_long = melt(df, id=c("game", "res", "winning_team")) %>% 
      .[, fgame:=min(game)==game, value] %>% 
      .[fgame==1, score_pre:= 100] %>% 
      .[,win:=winning_team == value] %>% 
      .[, score_post:= fifelse(win,res,-res)+score_pre] %>% 
      .[order(value,game)]
    
    1. 使用 for 循环填充团队所有其他出场的前一个得分和后一个得分
    for(i in 1:nrow(df_long)) {
      if(df_long[i,is.na(score_pre)]) {
        df_long[i,score_pre:=df_long[i-1,score_post]]
        df_long[i,score_post:=df_long[i,score_pre] + fifelse(df_long[i,win],res,-res)]
      }
    }
    
    1. 将数据转换回宽格式,并重命名列
    dcast(df_long, game+res+winning_team~variable, value.var = c("score_pre", "score_post","value")) %>% 
      .[, .(home_team = value_home_team, away_team=value_away_team, 
            home_team_score_pre = score_pre_home_team,
            away_team_score_pre = score_pre_away_team,
            winning_team, res,
            home_team_score_post = score_post_home_team,
            away_team_score_post = score_post_away_team)]
    
    

    输出:

       home_team away_team home_team_score_pre away_team_score_pre winning_team   res home_team_score_post away_team_score_post
          <char>    <char>               <num>               <num>       <char> <num>                <num>                <num>
    1:    Team A    Team B                 100                 100       Team A    16                  116                   84
    2:    Team B    Team C                  84                 100       Team C    25                   59                  125
    3:    Team C    Team D                 125                 100       Team C    11                  136                   89
    4:    Team D    Team A                  89                 116       Team D     4                   93                  112
    5:    Team B    Team F                  59                 100       Team F    22                   37                  122
    6:    Team F    Team A                 122                 112       Team F     9                  131                  103
    

    输入:

    df = structure(list(home_team = c("Team A", "Team B", "Team C", "Team D", 
    "Team B", "Team F"), away_team = c("Team B", "Team C", "Team D", 
    "Team A", "Team F", "Team A"), winning_team = c("Team A", "Team C", 
    "Team C", "Team D", "Team F", "Team F"), res = c(16, 25, 11, 
    4, 22, 9)), class = "data.frame", row.names = c(NA, -6L))
    

    以前的解决方案

    您可以使用如下辅助函数:

    # Helper Function return row
    prepare_next_row <- function(df, home, away) {
      r = data.table(home_team = home, away_team=away)
      lrh = last(df[home_team==home | away_team == home])
      lra = last(df[home_team==away | away_team == away])
      r[, home_team_score_pre:=fifelse(lrh$home_team==home,lrh$home_team_score_post, lrh$away_team_score_post)]
      r[, away_team_score_pre:=fifelse(lra$home_team==away,lra$home_team_score_post, lra$away_team_score_post)]
      r[]
    }
    

    现在,当你想要一个新行时,你只需调用它,连同rbind()

    rbind(
      df, 
      prepare_next_row(df, "Team C", "Team B"),
      prepare_next_row(df, "Team A", "Team F"),
      fill=TRUE
    )
    

    输出:

       home_team away_team home_team_score_pre away_team_score_pre winning_team   res home_team_score_change away_team_score_change home_team_score_post away_team_score_post
          <char>    <char>               <num>               <num>       <char> <num>                  <num>                  <num>                <num>                <num>
    1:    Team A    Team B                 300                 550       Team A    16                     16                    -16                  316                  534
    2:    Team B    Team C                 150                 340       Team C    25                    -25                     25                  125                  365
    3:    Team C    Team D                 600                 100       Team C    11                     11                    -11                  611                   89
    4:    Team D    Team A                 800                 208       Team D     4                      4                     -4                  804                  204
    5:    Team B    Team F                  50                 412       Team F    22                    -22                     22                   28                  434
    6:    Team F    Team A                 450                  18       Team F     9                      9                     -9                  459                    9
    7:    Team C    Team B                 611                  28         <NA>    NA                     NA                     NA                   NA                   NA
    8:    Team A    Team F                   9                 459         <NA>    NA                     NA                     NA                   NA                   NA
    

【讨论】:

  • 这太棒了。如果我已经拥有数据,但我不想添加新行,该怎么办?而是根据我所拥有的进行更新?抱歉,如果我在最初的帖子中没有明确说明这一点。
  • 好吧,我不明白 - 我现在已经大大改变了我的解决方案。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2021-10-13
  • 1970-01-01
  • 2022-11-28
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2020-10-27
相关资源
最近更新 更多