【问题标题】:Apply dplyr function over several columns在多列上应用 dplyr 函数
【发布时间】:2019-05-06 06:41:29
【问题描述】:

我有一个数据框,其中包含约 150K 行和 77 个分类变量,格式如下所示。我如何找到每个类别的分数和计数

一个数值变量和 77 个分组变量

students<-data.frame(ID = c("A","B","C","D"), Gender = c("M","F","F","F"), Socioeconomic = c("Low","Low","Medium","High"), Subject = c("Maths","Maths","Science", "Science"),
                    Scores = c(45,98, 50,38))

也就是说,我不想单独浏览每个分类列 77 次,但想要一个包含以下每个输出列表的小标题

students %>% group_by(Gender) %>% summarise(Mean.score = mean(Scores), Count = length(ID))

students %>% group_by(Socioeconomic) %>% summarise(Mean.score = mean(Scores), Count = length(ID))

students %>% group_by(Subject) %>% summarise(Mean.score = mean(Scores), Count = length(ID))```

【问题讨论】:

  • 我不确定链接的问题是否重复(尽管我认为这个问题之前可能已经被问过)。链接的问题侧重于如何汇总由一组分类列分组的多个数字列。此问题询问如何汇总按每个分类列连续分组的单个数字列。
  • 是的,这是正确的。我想将两个函数应用于由多个分类列分组的单个数字列

标签: r dplyr


【解决方案1】:

这里有两个选项:

library(tidyverse)

# map successively over each categorical column
map(students %>% select(-Scores, -ID) %>% names() %>% set_names(),
    ~ students %>% 
      group_by_at(.x) %>% 
      summarise(Mean.score = mean(Scores), 
                Count = n())
)
$Gender
# A tibble: 2 x 3
  Gender Mean.score Count
  <fct>       <dbl> <int>
1 F              62     3
2 M              45     1

$Socioeconomic
# A tibble: 3 x 3
  Socioeconomic Mean.score Count
  <fct>              <dbl> <int>
1 High                38       1
2 Low                 71.5     2
3 Medium              50       1

$Subject
# A tibble: 2 x 3
  Subject Mean.score Count
  <fct>        <dbl> <int>
1 Maths         71.5     2
2 Science       44       2
# Convert to long format, group, then summarize
students %>% 
  gather(key, value, -ID, -Scores) %>% 
  group_by(key, value) %>% 
  summarise(Count=n(),
            Mean.score=mean(Scores))
  key           value   Count Mean.score
  <chr>         <chr>   <int>      <dbl>
1 Gender        F           3       62  
2 Gender        M           1       45  
3 Socioeconomic High        1       38  
4 Socioeconomic Low         2       71.5
5 Socioeconomic Medium      1       50  
6 Subject       Maths       2       71.5
7 Subject       Science     2       44

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2017-08-02
    • 1970-01-01
    • 2018-09-14
    • 2023-04-08
    • 2017-01-05
    相关资源
    最近更新 更多