【问题标题】:Adding a column of total n for each group in a stacked frequency table为堆叠频率表中的每个组添加一列总 n
【发布时间】:2022-01-19 22:02:29
【问题描述】:

我有以下数据:

id    animal    color     shape
1      bear     orange    circle
2.     dog      NA        triangle
3.     NA       yellow    square
4.     cat      yellow    square
5.     NA       yellow    rectangle

如果我运行这段代码:

df1 <- df %>% 
  pivot_longer(
    -id,
    names_to = "Variable",
    values_to = "Level"
  ) %>% 
  group_by(Variable, Level) %>% 
  summarise(freq = n()) %>% 
  mutate(percent = freq/sum(freq)*100) %>% 
  mutate(Variable = ifelse(duplicated(Variable), NA, Variable)) %>% 
  ungroup()

我可以得到以下输出:

Variable     Level       freq(n=5)   percent

animal        bear          1           33.3
              dog           1           33.3
              cat           1           33.3
              

color         orange        1           25.0
              yellow        3           75.0
             

shape         circle        1           20.0
              triangle      1           20.0
              square        2           40.0
              rectangle     1           20.0
             

但是我还想在每个变量之后添加一行,其中包含总计:

Variable     Level       freq(n=5)   percent

animal        bear          1           33.3
              dog           1           33.3
              cat           1           33.3
              total         3           100.0

color         orange        1           25.0
              yellow        3           75.0
              total         4           100.0

shape         circle        1           20.0
              triangle      1           20.0
              square        2           40.0
              rectangle     1           20.0
              total         5           100.0

我尝试了 mutate 和 summarise 的不同变体,但不断收到错误“参数的无效'类型'(闭包)”。

【问题讨论】:

  • janitor::adorn_total
  • 您的输入和输出与正在发生的不匹配;初始 NA 值在哪里?对我来说,它们仍然存在,但不知何故,它们在您的预期输出中消失了。

标签: r dataframe tidyverse


【解决方案1】:

这是完成任务的一种方法:

library(dplyr)
library(tidyr)
library(janitor)

df %>% 
  pivot_longer(
    -id,
    names_to = "Variable",
    values_to = "Level"
  ) %>% 
  group_by(Variable, Level) %>% 
  summarise(freq = n()) %>% 
  mutate(percent = freq/sum(freq)*100) %>% 
  ungroup() %>% 
  group_by(Variable) %>% 
  group_split() %>% 
  adorn_totals() %>% 
  bind_rows() %>% 
  mutate(Level = ifelse(Level == last(Level), last(Variable), Level)) %>% 
  mutate(Variable = ifelse(duplicated(Variable) |
                             Variable == "Total", NA, Variable))
 Variable     Level freq percent
   animal      bear    1      20
     <NA>       cat    1      20
     <NA>       dog    1      20
     <NA>      <NA>    2      40
     <NA>     Total    5     100
    color    orange    1      20
     <NA>    yellow    3      60
     <NA>      <NA>    1      20
     <NA>     Total    5     100
    shape    circle    1      20
     <NA> rectangle    1      20
     <NA>    square    2      40
     <NA>  triangle    1      20
     <NA>     Total    5     100

【讨论】:

  • 投反对票的原因?
【解决方案2】:

library(dplyr)
library(tidyr)
library(purrr)
library(janitor)

df1 %>% 
  pivot_longer(
    -id,
    names_to = "Variable",
    values_to = "Level"
  ) %>% 
  group_by(Variable, Level) %>% 
  summarise(freq = n()) %>% 
  mutate(percent = freq/sum(freq)*100) %>% 
  group_split() %>% 
  map_dfr(. , ~.x %>% 
            adorn_totals(name = "total")) %>% 
  mutate(Variable = ifelse(duplicated(Variable) & Variable != "total", NA, Variable)) %>% 
  ungroup()

#>  Variable     Level freq percent
#>    animal      bear    1      20
#>      <NA>       cat    1      20
#>      <NA>       dog    1      20
#>      <NA>      <NA>    2      40
#>     total         -    5     100
#>     color    orange    1      20
#>      <NA>    yellow    3      60
#>      <NA>      <NA>    1      20
#>     total         -    5     100
#>     shape    circle    1      20
#>      <NA> rectangle    1      20
#>      <NA>    square    2      40
#>      <NA>  triangle    1      20
#>     total         -    5     100

数据:

read.table(text = "id    animal    color     shape
1      bear     orange    circle
2     dog      NA        triangle
3     NA       yellow    square
4     cat      yellow    square
5     NA       yellow    rectangle", header = T, stringsAsFactors =  F) -> df1

【讨论】:

    【解决方案3】:

    如果我们在定义df1 时停下脚步,

    df1 <- df %>%
      pivot_longer( -id, names_to = "Variable", values_to = "Level" ) %>%
      group_by(Variable, Level) %>%
      summarise(freq = n()) %>%
      mutate(percent = freq/sum(freq)*100)
    
    df1
    # # A tibble: 11 x 4
    # # Groups:   Variable [3]
    #    Variable Level      freq percent
    #    <chr>    <chr>     <int>   <dbl>
    #  1 animal   bear          1      20
    #  2 animal   cat           1      20
    #  3 animal   dog           1      20
    #  4 animal   <NA>          2      40
    #  5 color    orange        1      20
    #  6 color    yellow        3      60
    #  7 color    <NA>          1      20
    #  8 shape    circle        1      20
    #  9 shape    rectangle     1      20
    # 10 shape    square        2      40
    # 11 shape    triangle      1      20
    

    然后我们可以使用组摘要对其进行扩充(并重新排序):

    df1 %>%
      group_by(Variable) %>%
      summarize(Level = "total", across(freq:percent, sum)) %>%
      bind_rows(df1) %>%
      arrange(Variable, !is.na(Level), Level == "total", Level) %>%
      mutate(Variable = ifelse(duplicated(Variable), NA, Variable))
    # # A tibble: 14 x 4
    #    Variable Level      freq percent
    #    <chr>    <chr>     <int>   <dbl>
    #  1 animal   <NA>          2      40
    #  2 <NA>     bear          1      20
    #  3 <NA>     cat           1      20
    #  4 <NA>     dog           1      20
    #  5 <NA>     total         5     100
    #  6 color    <NA>          1      20
    #  7 <NA>     orange        1      20
    #  8 <NA>     yellow        3      60
    #  9 <NA>     total         5     100
    # 10 shape    circle        1      20
    # 11 <NA>     rectangle     1      20
    # 12 <NA>     square        2      40
    # 13 <NA>     triangle      1      20
    # 14 <NA>     total         5     100
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2017-08-26
      • 1970-01-01
      • 2017-08-22
      • 1970-01-01
      • 2021-03-19
      • 2012-11-30
      • 2020-07-23
      • 1970-01-01
      相关资源
      最近更新 更多