【问题标题】:Show both Ns and proportions in two-way frequency table在双向频率表中同时显示 Ns 和比例
【发布时间】:2020-07-25 07:52:00
【问题描述】:

我正在尝试创建一个不符合“整洁”输出的发布表:

dummy <- data.frame(categorical_1 = c("a", "b", "a", "a", "b", "b", "a", "b", "b", "a"),
                    categorical_2 = c(rep("one", 5), rep("two", 5)),
                    numeric = sample(1:10, 10))

dummy %>%
  count(categorical_1, categorical_2) %>%
  group_by(categorical_1) %>%      
  mutate(prop = prop.table(n))

Tidyverse 输出

  categorical_1 categorical_2     n  prop
  <fct>         <fct>         <int> <dbl>
1 a             one               3   0.6
2 a             two               2   0.4
3 b             one               2   0.4
4 b             two               3   0.6

期望的输出:

Category          One       Two
a                 3 (0.6)     2 (0.4)
b                 2 (0.4)     3 (0.6)

也许我可以应用其他mutate 步骤来使表格符合我想要的输出?

【问题讨论】:

    标签: r dplyr


    【解决方案1】:
    library(janitor)
    
    dummy %>%
      tabyl(categorical_1, categorical_2) %>%
      adorn_percentages("row") %>%
      adorn_ns(position = "front")
    
    #>  categorical_1     one     two
    #>              a 3 (0.6) 2 (0.4)
    #>              b 2 (0.4) 3 (0.6)
    

    【讨论】:

      【解决方案2】:

      nprop 合并为一列后可以使用pivot_wider

      library(tidyverse)
      
      d2 %>% 
        mutate(v = paste0(n, ' (', prop, ')')) %>% 
        pivot_wider(id_cols = categorical_1, names_from = categorical_2, values_from = v) %>% 
        rename_at(1, ~'Category')
      
      # # A tibble: 2 x 3
      # # Groups:   Category [2]
      #   Category one     two    
      #   <fct>    <chr>   <chr>  
      # 1 a        3 (0.6) 2 (0.4)
      # 2 b        2 (0.4) 3 (0.6)
      

      问题的初始数据

      d2 <- 
        dummy %>%
          count(categorical_1, categorical_2) %>%
          group_by(categorical_1) %>%      
          mutate(prop = prop.table(n))
      

      【讨论】:

        【解决方案3】:

        这与其他答案没有太大区别。我想了解一些可能归结为偏好的东西:

        • count 丢弃组,而summarise 剥离最后一组;由于您需要在mutate 中再次使用第一组 (categorical_1),您可以先调用 group_by,然后再调用 summarise,然后计算您的比例以获得更多控制权
        • 我发现使用基于 glue 的函数构建此类字符串比使用各种标点符号或其他分隔符调用 paste 更清晰
        • 您想要的输出有标题大小写的列名,没有数字,所以我在最后的rename_all 中清理了它
        library(dplyr)
        library(tidyr)
        library(stringr)
        
        dummy %>%
          group_by(categorical_1, categorical_2) %>%
          summarise(n = n()) %>%
          mutate(prop = n / sum(n),
                 display = str_glue("{n} ({prop})")) %>%
          select(-n, -prop) %>%
          pivot_wider(names_from = categorical_2, values_from = display) %>%
          rename_all(~str_remove(., "_\\d+") %>% str_to_title())
        #> # A tibble: 2 x 3
        #> # Groups:   Categorical [2]
        #>   Categorical One     Two    
        #>   <fct>       <chr>   <chr>  
        #> 1 a           3 (0.6) 2 (0.4)
        #> 2 b           2 (0.4) 3 (0.6)
        

        【讨论】:

          【解决方案4】:

          从你那里的管道上取货,我们可以unitenpropspread,即

          dummy %>%
               count(categorical_1, categorical_2) %>%
               group_by(categorical_1) %>%
               mutate(prop = prop.table(n))  %>%
               unite(n_prop, n, prop) %>% 
               spread(categorical_2, n_prop)
          

          给出,

          # A tibble: 2 x 3
          # Groups:   categorical_1 [2]
            categorical_1 one   two  
            <fct>         <chr> <chr>
          1 a             3_0.6 2_0.4
          2 b             2_0.4 3_0.6
          

          您可以使用unite 的分隔符并改变以粘贴右括号如果您严格需要它

          【讨论】:

          • unite 行替换为mutate(prop = paste0("(", prop, ")")) %&gt;% unite(n_prop, n, prop, sep = " ") `
          【解决方案5】:

          data.table 解决方案:

          library(data.table)
          
          dcast(setDT(dummy)[, .(count = .N), 
                               .(categorical_1, categorical_2)], 
                categorical_1~categorical_2)[,
                                             .(categorical_1 = categorical_1,
                                               one=paste0(one, " (", one/sum(one), ")"),
                                               two=paste0(two, " (", one/sum(two), ")"))]
          
          #>    categorical_1     one     two
          #> 1:             a 3 (0.6) 2 (0.6)
          #> 2:             b 2 (0.4) 3 (0.4)
          

          数据:

          dummy <- data.frame(categorical_1 = c("a", "b", "a", "a", "b", "b", "a", "b", "b", "a"),
                              categorical_2 = c(rep("one", 5), rep("two", 5)),
                              numeric = sample(1:10, 10))
          

          【讨论】:

            猜你喜欢
            • 2019-09-29
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 2014-12-13
            • 1970-01-01
            • 2019-07-21
            • 1970-01-01
            • 2011-02-02
            相关资源
            最近更新 更多