【问题标题】：Show both Ns and proportions in two-way frequency table在双向频率表中同时显示 Ns 和比例
【发布时间】：2020-07-25 07:52:00
【问题描述】：

我正在尝试创建一个不符合“整洁”输出的发布表：

dummy <- data.frame(categorical_1 = c("a", "b", "a", "a", "b", "b", "a", "b", "b", "a"),
                    categorical_2 = c(rep("one", 5), rep("two", 5)),
                    numeric = sample(1:10, 10))

dummy %>%
  count(categorical_1, categorical_2) %>%
  group_by(categorical_1) %>%      
  mutate(prop = prop.table(n))

Tidyverse 输出

  categorical_1 categorical_2     n  prop
  <fct>         <fct>         <int> <dbl>
1 a             one               3   0.6
2 a             two               2   0.4
3 b             one               2   0.4
4 b             two               3   0.6

期望的输出：

Category          One       Two
a                 3 (0.6)     2 (0.4)
b                 2 (0.4)     3 (0.6)

也许我可以应用其他mutate 步骤来使表格符合我想要的输出？

【问题讨论】：

标签： r dplyr

【解决方案1】：

library(janitor)

dummy %>%
  tabyl(categorical_1, categorical_2) %>%
  adorn_percentages("row") %>%
  adorn_ns(position = "front")

#>  categorical_1     one     two
#>              a 3 (0.6) 2 (0.4)
#>              b 2 (0.4) 3 (0.6)

【讨论】：

【解决方案2】：

n 和prop 合并为一列后可以使用pivot_wider

library(tidyverse)

d2 %>% 
  mutate(v = paste0(n, ' (', prop, ')')) %>% 
  pivot_wider(id_cols = categorical_1, names_from = categorical_2, values_from = v) %>% 
  rename_at(1, ~'Category')

# # A tibble: 2 x 3
# # Groups:   Category [2]
#   Category one     two    
#   <fct>    <chr>   <chr>  
# 1 a        3 (0.6) 2 (0.4)
# 2 b        2 (0.4) 3 (0.6)

问题的初始数据

d2 <- 
  dummy %>%
    count(categorical_1, categorical_2) %>%
    group_by(categorical_1) %>%      
    mutate(prop = prop.table(n))

【讨论】：

【解决方案3】：

这与其他答案没有太大区别。我想了解一些可能归结为偏好的东西：

count 丢弃组，而summarise 剥离最后一组；由于您需要在mutate 中再次使用第一组 (categorical_1)，您可以先调用 group_by，然后再调用 summarise，然后计算您的比例以获得更多控制权
我发现使用基于 glue 的函数构建此类字符串比使用各种标点符号或其他分隔符调用 paste 更清晰
您想要的输出有标题大小写的列名，没有数字，所以我在最后的rename_all 中清理了它

library(dplyr)
library(tidyr)
library(stringr)

dummy %>%
  group_by(categorical_1, categorical_2) %>%
  summarise(n = n()) %>%
  mutate(prop = n / sum(n),
         display = str_glue("{n} ({prop})")) %>%
  select(-n, -prop) %>%
  pivot_wider(names_from = categorical_2, values_from = display) %>%
  rename_all(~str_remove(., "_\\d+") %>% str_to_title())
#> # A tibble: 2 x 3
#> # Groups:   Categorical [2]
#>   Categorical One     Two    
#>   <fct>       <chr>   <chr>  
#> 1 a           3 (0.6) 2 (0.4)
#> 2 b           2 (0.4) 3 (0.6)

【讨论】：

【解决方案4】：

从你那里的管道上取货，我们可以uniten和prop和spread，即

dummy %>%
     count(categorical_1, categorical_2) %>%
     group_by(categorical_1) %>%
     mutate(prop = prop.table(n))  %>%
     unite(n_prop, n, prop) %>% 
     spread(categorical_2, n_prop)

给出，

# A tibble: 2 x 3
# Groups:   categorical_1 [2]
  categorical_1 one   two  
  <fct>         <chr> <chr>
1 a             3_0.6 2_0.4
2 b             2_0.4 3_0.6

您可以使用unite 的分隔符并改变以粘贴右括号如果您严格需要它

【讨论】：

将unite 行替换为mutate(prop = paste0("(", prop, ")")) %>% unite(n_prop, n, prop, sep = " ") `

【解决方案5】：

data.table 解决方案：

library(data.table)

dcast(setDT(dummy)[, .(count = .N), 
                     .(categorical_1, categorical_2)], 
      categorical_1~categorical_2)[,
                                   .(categorical_1 = categorical_1,
                                     one=paste0(one, " (", one/sum(one), ")"),
                                     two=paste0(two, " (", one/sum(two), ")"))]

#>    categorical_1     one     two
#> 1:             a 3 (0.6) 2 (0.6)
#> 2:             b 2 (0.4) 3 (0.4)

数据：

dummy <- data.frame(categorical_1 = c("a", "b", "a", "a", "b", "b", "a", "b", "b", "a"),
                    categorical_2 = c(rep("one", 5), rep("two", 5)),
                    numeric = sample(1:10, 10))

【讨论】：