【问题标题】:Matching rows within the same dataset in RR中相同数据集中的匹配行
【发布时间】:2017-12-06 14:54:28
【问题描述】:

我有一个这样的独特匹配数据集。每一行都与结果匹配。

date <- c('2017/12/01','2017/11/01','2017/10/01','2017/09/01','2017/08/01','2017/07/01','2017/06/01')
team1 <- c('A','B','B','C','D','A','B')
team1_score <- c(1,0,4,3,5,6,7)
team2 <- c('B','A','A','B','C','C','A')
team2_score <- c(0,1,5,4,6,9,10)
matches <- data.frame(date, team1, team1_score, team2, team2_score)

我想创建 2 个新栏目,分别为第 1 队和第 2 队的表格。比赛的结果可以由哪支球队的得分更高或平局来决定。结果如下所示。所以表格将是 team1 在最后 2 场比赛中的结果。例如,对于前 3 行,队 1 和 2 的形式分别是。有时某支球队没有足够的 2 场比赛,因此 NULL 的结果就足够了。我想知道 team1 和 team2 进入比赛的形式。

  • Form1:W-W、L-W、W-L
  • Form2:L-L、W-L、L-W

在实际数据集中,不仅仅是 4 个独特的团队。我一直在思考,但想不出创建这两个变量的好方法。

【问题讨论】:

  • 您如何知道结果? 'B' 分数是否超过 1 分?
  • 请提供一个可重现的例子。寻找函数 dput() 来帮助您提供可重现的数据集。你有很多选择。如果数据是长格式(即“整齐”),则您可能会获得胜利和失败的滚动总和,这将产生格式为 W|L 的结果,或者根据您的示例,3|2。您还可以有一个 if_else 函数创建一个字母字符串形式的数学结果滚动到最后五次迭代,然后连接这些字符串。
  • @Gregor:是的,球队可以在Team1或Team2,如果他们的分数更大,那就意味着他们赢了。
  • 要解决这个问题,第一步是获取数据。如果您共享数据,而不是让任何致力于解决此问题的人来做创建假数据的工作,那将是非常好的。您不需要分享太多,但需要与大约 3 支球队分享 6-10 行比赛。如果您的示例数据是可复制/可粘贴的,您将更快地获得帮助,方法是共享代码以模拟它或共享 dput() 的输出以创建其结构。 There are lots of tips on making R reproducible examples here.
  • @Gregor:我已经调整了上面的问题以使其更清晰。

标签: r matching


【解决方案1】:

这是我的解决方案:

    library(tidyverse)


    date <- as.Date(c('2017/12/01','2017/11/01','2017/10/01','2017/09/01','2017/08/01','2017/07/01','2017/06/01', '2017/05/30'))
    team1 <- c('A','B','B','C','D','A','B','A')
    team1_score <- c(1,0,4,3,5,6,7,0)
    team2 <- c('B','A','A','B','C','C','A','D')
    team2_score <- c(0,1,5,4,6,9,10,0)
    matches <- data.frame(date, team1, team1_score, team2, team2_score)

    ## 1. Create a unique identifier for each match. It assumes that teams can only play each other once a day.
    matches$UID <- paste(matches$date, matches$team1, matches$team2, sep = "-")

    ## 2. Create a Score Difference Varaible reflecting team1's score
    matches <- matches %>% mutate(score_dif_team1 = team1_score - team2_score)

    ## 3. Create a Result (WDL) reflecting team1's results
    matches <- matches %>% mutate(results_team1 = if_else(score_dif_team1 < 0, true = "L", false = if_else(score_dif_team1 > 0, true = "W", false = "D")))

    ## 4. Cosmetic step: Reorder variables for easier comparison across variables
    matches <- matches %>% select(UID, date:results_team1)

    ## 5. Reshape the table into a long format based on the teams. Each observation will now reflect the results of 1 team within a match. Each game will have two observations.
    matches <- matches %>% gather(key = old_team_var, value = team, team1, team2)

    ## 6. Stablishes a common results variable for each observation.  It essentially inverts the results_team1 varaible for teams2, and keeps results_team1 identical for teams1
    matches <- matches %>% 
                mutate(results = if_else(old_team_var == "team2", 
                                                    true = if_else(results_team1 == "W", 
                                                                   true = "L", 
                                                                   false = if_else(results_team1 == "L", 
                                                                                     true = "W",
                                                                                     false = "D")),
                                                    false = results_team1))

## Final step: Filter the matches table by the dates you are interested into, and then reshapes the table to show a data frame of DLW in long format.

    Results_table <- matches %>% filter(date <= as.Date("2017-12-01")) %>% group_by(team, results) %>% summarise(cases = n()) %>% spread(key = results, value = cases, fill = 0)

## Results:
    # A tibble: 4 x 4
    # Groups:   team [4]
       team     D     L     W
    * <chr> <dbl> <dbl> <dbl>
    1     A     1     1     4
    2     B     0     4     1
    3     C     0     1     2
    4     D     1     1     0

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-12-27
    • 1970-01-01
    • 1970-01-01
    • 2023-03-05
    • 2017-12-15
    • 1970-01-01
    相关资源
    最近更新 更多