【问题标题】:How to count number of similar occurences of a combination in a dataframe? [duplicate]如何计算数据框中组合的相似出现次数? [复制]
【发布时间】:2016-07-18 06:58:05
【问题描述】:

我很天真,我已经在 R 上加载了一个著名的数据集,现在我想用它做几个实验。下面是我到目前为止执行的脚本数组:

我有一个battles 数据框:

str(battles)

'data.frame':   38 obs. of  25 variables:
 $ name              : Factor w/ 38 levels "Battle at the Mummer's Ford",..: 13 1 7 14 18 10 25 5 3 17 ...
 $ year              : int  298 298 298 298 298 298 298 299 299 299 ...
 $ battle_number     : int  1 2 3 4 5 6 7 8 9 10 ...
 $ attacker_king     : Factor w/ 5 levels "","Balon/Euron Greyjoy",..: 3 3 3 4 4 4 3 2 2 2 ...
 $ defender_king     : Factor w/ 7 levels "","Balon/Euron Greyjoy",..: 6 6 6 3 3 3 6 6 6 6 ...
 $ attacker_1        : Factor w/ 11 levels "Baratheon","Bolton",..: 10 10 10 11 11 11 10 9 9 9 ...
 $ attacker_2        : Factor w/ 8 levels "","Bolton","Frey",..: 1 1 1 1 8 8 1 1 1 1 ...
 $ attacker_3        : Factor w/ 3 levels "","Giants","Mormont": 1 1 1 1 1 1 1 1 1 1 ...
 $ attacker_4        : Factor w/ 2 levels "","Glover": 1 1 1 1 1 1 1 1 1 1 ...
 $ defender_1        : Factor w/ 13 levels "","Baratheon",..: 12 2 12 8 8 8 6 11 11 11 ...
 $ defender_2        : Factor w/ 3 levels "","Baratheon",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ defender_3        : logi  NA NA NA NA NA NA ...
 $ defender_4        : logi  NA NA NA NA NA NA ...
 $ attacker_outcome  : Factor w/ 3 levels "","loss","win": 3 3 3 2 3 3 3 3 3 3 ...
 $ battle_type       : Factor w/ 5 levels "","ambush","pitched battle",..: 3 2 3 3 2 2 3 3 5 2 ...
 $ major_death       : int  1 1 0 1 1 0 0 0 0 0 ...
 $ major_capture     : int  0 0 1 1 1 0 0 0 0 0 ...
 $ attacker_size     : int  15000 NA 15000 18000 1875 6000 NA NA 1000 264 ...
 $ defender_size     : int  4000 120 10000 20000 6000 12625 NA NA NA NA ...
 $ attacker_commander: Factor w/ 32 levels "","Asha Greyjoy",..: 8 6 9 22 16 18 6 30 2 28 ...
 $ defender_commander: Factor w/ 29 levels "","Amory Lorch",..: 7 4 10 28 12 14 15 1 1 1 ...
 $ summer            : int  1 1 1 1 1 1 1 1 1 1 ...
 $ location          : Factor w/ 28 levels "","Castle Black",..: 8 13 17 9 27 17 4 12 5 23 ...
 $ region            : Factor w/ 7 levels "Beyond the Wall",..: 7 5 5 5 5 5 5 3 3 3 ...
 $ note              : Factor w/ 6 levels "","Greyjoy's troop number based on the Battle of Deepwood Motte, in which Asha had 1000 soldier on 30 longships. That comes out to"| __truncated__,..: 1 1 1 1 1 1 1 1 1 2 ...

我的要求是我想知道到目前为止,一个国王在他的整个 GOT 范围内有多少输赢。

select(battles,attacker_outcome,attacker_king)
   attacker_outcome            attacker_king
1               win Joffrey/Tommen Baratheon
2               win Joffrey/Tommen Baratheon
3               win Joffrey/Tommen Baratheon
4              loss               Robb Stark
5               win               Robb Stark
6               win               Robb Stark
7               win Joffrey/Tommen Baratheon
8               win      Balon/Euron Greyjoy
9               win      Balon/Euron Greyjoy
10              win      Balon/Euron Greyjoy
11              win               Robb Stark
12              win      Balon/Euron Greyjoy
13              win      Balon/Euron Greyjoy
14              win Joffrey/Tommen Baratheon
15              win               Robb Stark
16              win        Stannis Baratheon
17             loss Joffrey/Tommen Baratheon
18              win               Robb Stark
19              win               Robb Stark
20             loss        Stannis Baratheon
21              win               Robb Stark
22             loss               Robb Stark
23              win                         
24              win Joffrey/Tommen Baratheon
25              win Joffrey/Tommen Baratheon
26              win Joffrey/Tommen Baratheon
27              win               Robb Stark
28             loss        Stannis Baratheon
29              win Joffrey/Tommen Baratheon
30              win                         
31              win        Stannis Baratheon
32              win      Balon/Euron Greyjoy
33              win      Balon/Euron Greyjoy
34              win Joffrey/Tommen Baratheon
35              win Joffrey/Tommen Baratheon
36              win Joffrey/Tommen Baratheon
37              win Joffrey/Tommen Baratheon
38                         Stannis Baratheon

我还需要 2 个名为“获胜次数”和“失败次数”的列 每个攻击王。

注意:如果我的问题以任何方式损害了 stackOverFlow 提问政策,请原谅,因为这是我在 R 中的第一个问题。

【问题讨论】:

    标签: r


    【解决方案1】:

    您可以使用基础包中的table

    table(df$attacker_king,df$attacker_outcome )
    
    #                           loss win
    #  Balon/Euron Greyjoy         0   7
    #  Joffrey/Tommen Baratheon    1  13
    #  Robb Stark                  2   8
    #  Stannis Baratheon           2   2
    

    【讨论】:

      【解决方案2】:

      一个选项是dplyr。在按'attacker_king'分组后,我们summarise通过创建两列('NoWins','NoLoss')基于“赢”和“损失”逻辑向量的sum和如果需要filter的输出去掉 'attacker_king' 中的空白元素。

      library(dplyr)
      battles %>%
            group_by(attacker_king) %>%
            summarise(NoWins = sum(attacker_outcome == "win"),
                       NoLoss = sum(attacker_outcome == "loss")) %>%
            filter(nzchar(attacker_king))
      #            attacker_king NoWins NoLoss
      #                 <chr>  <int>  <int>
      #1      Balon/Euron Greyjoy      7      0
      #2 Joffrey/Tommen Baratheon     13      1
      #3               Robb Stark      8      2
      #4        Stannis Baratheon      2      2
      

      或者我们可以使用dplyr/tidyr。分组后,我们用tallyfilter(如上)和spread(来自tidyr)得到频率计数,将“长”格式转换为“宽”格式。

      library(tidyr)
      battles %>%
           group_by(attacker_king, attacker_outcome) %>%
           tally() %>% 
           filter(nzchar(attacker_king) & nzchar(attacker_outcome)) %>% 
           spread(attacker_outcome, n)
      

      或者使用来自data.tabledcast。这会容易得多,因为dcast 也有fun.aggregate,所以我们可以在重塑为“宽”格式时指定函数(在本例中为length)。

      library(data.table)
      dcast(setDT(battles), attacker_king~attacker_outcome, length)[nzchar(attacker_king)
                              ][, -2, with = FALSE]
      #                attacker_king loss win
      #1:      Balon/Euron Greyjoy    0   7
      #2: Joffrey/Tommen Baratheon    1  13
      #3:               Robb Stark    2   8
      #4:        Stannis Baratheon    2   2
      

      或者从base R使用table

      table(battles[c("attacker_king", "attacker_outcome")])[-1,-1]
      #                          attacker_outcome
      #  attacker_king              loss win
      #  Balon/Euron Greyjoy         0   7
      #  Joffrey/Tommen Baratheon    1  13
      #  Robb Stark                  2   8
      #  Stannis Baratheon           2   2
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2021-01-11
        • 1970-01-01
        • 1970-01-01
        • 2014-09-30
        • 1970-01-01
        • 2012-07-23
        • 2021-03-27
        相关资源
        最近更新 更多