【问题标题】:Manipulating data frame in R (aggregating by the player)在 R 中操作数据框(由玩家聚合)
【发布时间】:2017-07-20 17:58:18
【问题描述】:

我有 csv 文件,格式如下:

Player    Sports      Win     Loss
Brian     Football     5       3
Brian     Basketball   4       1
Brian     Bowling      7       0
Chris     Football     3       3
Chris     Basketball   3       4
. . . . 
. . . .

我想将格式更改为以下:

Name&Sports   Win         Loss    Total
Brian         16           4       20
Football      5            3       8
Basketball    4            1       5
Bowling       7            0       7
Chris         6            7       13
Football      3            3       6
Basketball    3            4       7   
. . . .
. . . . 

基本上,在新格式中,我们首先写下此人的姓名以及该人参加的所有运动的赢、输、比赛总数。在接下来的行中,我们写下了这个人参加的每项运动,以及在该特定运动中所参加的赢、输、比赛的总数。一旦我们为那个人写了所有东西,我们就会转移到下一个人并做同样的事情。

在 R 中有一种简单的方法吗?

【问题讨论】:

    标签: r csv dataframe datatable


    【解决方案1】:
    df <- read.table(text = "Player    Sports      Win     Loss
    Brian     Football     5       3
                     Brian     Basketball   4       1
                     Brian     Bowling      7       0
                     Chris     Football     3       3
                     Chris     Basketball   3       4",header=T)
    
    tmp <- aggregate(df$Win,by=list(df$Player),sum)
    tmp <- cbind(tmp, aggregate(df$Loss,by=list(df$Player),sum)[2])
    names(tmp) <- colnames(df)[2:4]
    
    df <- rbind(df[,2:ncol(df)], tmp)          
    df$Total <- df$Loss + df$Win
    df
    
          Sports Win Loss Total
    1   Football   5    3     8
    2 Basketball   4    1     5
    3    Bowling   7    0     7
    4   Football   3    3     6
    5 Basketball   3    4     7
    6      Brian  16    4    20
    7      Chris   6    7    13
    

    或者,如果匹配示例中的行顺序很重要:

    df <- rbind(tmp[1,], df[1:3,2:ncol(df)], 
                tmp[2,], df[4:nrow(df),2:ncol(df)]) # could easily be made more programmatic          
    df$Total <- df$Loss + df$Win
    df
    
           Sports Win Loss Total
    1       Brian  16    4    20
    2    Football   5    3     8
    3  Basketball   4    1     5
    4     Bowling   7    0     7
    21      Chris   6    7    13
    41   Football   3    3     6
    5  Basketball   3    4     7
    

    【讨论】:

      【解决方案2】:

      来自tidyverse 的解决方案。 dt_final 是最终输出。

      # Create example data frame
      dt <- read.table(text = "Player    Sports      Win     Loss
      Brian     Football     5       3
      Brian     Basketball   4       1
      Brian     Bowling      7       0
      Chris     Football     3       3
      Chris     Basketball   3       4",
                       header = TRUE, stringsAsFactors = FALSE)
      
      # Load package
      library(tidyverse)
      
      # Split data frame by players
      dt_list <- split(dt, f = dt$Player)
      
      # Define a funciton to process data
      sum_fun <- function(dt){
        playername <- unique(dt$Player)
      
        dt1 <- dt %>% 
          mutate(Total = Win + Loss) %>%
          select(-Player) 
        dt2 <- data_frame(Sports = playername,
                          Win = sum(dt1$Win),
                          Loss = sum(dt1$Loss),
                          Total = sum(dt1$Total))
        dt3 <- bind_rows(dt2, dt1)
      
        return(dt3)
      }
      
      # Apply the function
      dt_final <- dt_list %>%
        map_df(sum_fun) %>%
        bind_rows() %>%
        rename(`Name&Sports` = Sports)
      

      【讨论】:

        猜你喜欢
        • 2020-03-28
        • 1970-01-01
        • 2016-01-13
        • 2016-05-17
        • 1970-01-01
        • 1970-01-01
        • 2014-11-10
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多