【问题标题】:Summary Tables using Nested Tibbles使用嵌套 Tibbles 的汇总表
【发布时间】:2019-09-18 11:25:28
【问题描述】:

我正在尝试使用purrr/tibble 方法生成一个汇总统计表。我可以使用以下方法计算分组平均值 (sd) 和计数:

library(dplyr)
library(tidyr)
library(purrr)
library(tibble)

mtcars %>%
  gather(variable, value, -vs, -am) %>%
  group_by(vs, am, variable) %>% 
  nest() %>% 
  filter(variable %in% c("mpg", "hp")) %>% 
  mutate(
    mean = map_dbl(data, ~mean(.$value, na.rm = TRUE)),
    sd = map_dbl(data, ~sd(.$value, na.rm = TRUE)),
    n = map_dbl(data, ~sum(!is.na(.$value)))
  )  %>% 
  select(vs:variable, mean:n) %>% 
  mutate_at(vars(mean, sd), round, 3) %>% 
  mutate(mean_sd = paste0(mean, " (", sd, ")"),
         var_group = paste(vs, am, variable, sep = "_")) %>% 
  select(n:var_group) %>%
  nest(n, mean_sd, .key = "summary") %>% 
  spread(key = var_group, value = summary) %>% 
  unnest()

我的直接问题是,如何在unnest()-ed 输出中保留spread(key = var_group, value = summary) 中的列名?

编辑:感谢大家的回复。 https://stackoverflow.com/a/55912326/5745045 的优点是更易于阅读且不存储临时变量。一个缺点是在n 列中将数字更改为字符。

最终目标是在分组kable 表的上下文中用格式化文本替换列名。

【问题讨论】:

  • 如果你查看每一列,它是一个tibble,有自己的列名。所以,不确定你想要的预期输出是什么
  • 可能在spread 步骤之后%>% gather %>% unnest %>% gather(key1, val, n, mean_sd) %>% unite(key, key, key1) %>% spread(key, val) %>% retype library(hablar)
  • 或者另一个选项是%>% imap_dfc(., ~ {nm1 <- .y; .x %>% pluck(1) %>% rename_all(~ str_c(.x, "_", nm1))})
  • 在这种情况下,注释的应该可以工作
  • @akrun 这非常合理。尝试返回0.3.0,但无法安装依赖项。无论如何,感谢您的调查。

标签: r dplyr tidyverse tidyr purrr


【解决方案1】:

通过将"nested" tibble 存储为临时变量1 并使用它的colnames2,我们可以实现你渴望。往下看;

mtcars %>%
  gather(variable, value, -vs, -am) %>%
  group_by(vs, am, variable) %>% 
  nest() %>% 
  filter(variable %in% c("mpg", "hp")) %>% 
  mutate(
    mean = map_dbl(data, ~mean(.$value, na.rm = TRUE)),
    sd = map_dbl(data, ~sd(.$value, na.rm = TRUE)),
    n = map_dbl(data, ~sum(!is.na(.$value)))
  )  %>% 
  select(vs:variable, mean:n) %>% 
  mutate_at(vars(mean, sd), round, 3) %>% 
  mutate(mean_sd = paste0(mean, " (", sd, ")"),
         var_group = paste(vs, am, variable, sep = "_")) %>% 
  select(n:var_group) %>%
  nest(n, mean_sd, .key = "summary") %>% 
  spread(key = var_group, value = summary) %>% 
  #1: storing the temporary nested variable
  {. ->> temptibble} %>%
  unnest() %>% 
  #2: renaming the columns of unnested output and removing temporary variable
  rename_all(funs(paste0(., "_", rep(colnames(temptibble), each=2)))); rm(temptibble)
# # A tibble: 1 x 16
#   n_0_0_hp   mean_sd_0_0_hp  n1_0_0_mpg  mean_sd1_0_0_mpg  n2_0_1_hp  mean_sd2_0_1_hp n3_0_1_mpg  mean_sd3_0_1_mpg
#   <dbl>      <chr>                <dbl>  <chr>                 <dbl>  <chr>                <dbl>  <chr>                
# 1       12  194.167 (33.36)          12     15.05 (2.774)          6 180.833 (98.816)          6     19.75 (4.009)
#    n4_1_0_hp   mean_sd4_1_0_hp n5_1_0_mpg  mean_sd5_1_0_mpg   n6_1_1_hp  mean_sd6_1_1_hp  n7_1_1_mpg  mean_sd7_1_1_mpg
#        <dbl>   <chr>                <dbl>  <chr>                  <dbl>  <chr>                 <dbl>  <chr>
# 1         7   102.143 (20.932)         7     20.743 (2.471)           7  80.571 (24.144)           7    28.371 (4.758)

【讨论】:

    【解决方案2】:

    这是另一种不需要创建临时变量的方法。我没有在末尾嵌套数据,而是使用gather()unite() 重构数据,使其最终成为一个键值对。

    library(tidyverse)
    #> Registered S3 methods overwritten by 'ggplot2':
    #>   method         from 
    #>   [.quosures     rlang
    #>   c.quosures     rlang
    #>   print.quosures rlang
    #> Registered S3 method overwritten by 'rvest':
    #>   method            from
    #>   read_xml.response xml2
    mtcars %>%
      gather(variable, value, -vs, -am) %>%
      group_by(vs, am, variable) %>% 
      nest() %>% 
      filter(variable %in% c("mpg", "hp")) %>% 
      mutate(
        mean = map_dbl(data, ~mean(.$value, na.rm = TRUE)),
        sd = map_dbl(data, ~sd(.$value, na.rm = TRUE)),
        n = map_dbl(data, ~sum(!is.na(.$value)))
      )  %>% 
      select(vs:variable, mean:n) %>% 
      mutate_at(vars(mean, sd), round, 3) %>% 
      mutate(mean_sd = paste0(mean, " (", sd, ")"),
             var_group = paste(vs, am, variable, sep = "_")) %>% 
      select(n:var_group) %>% 
      gather(key, value, -var_group) %>% 
      unite(var_group_key, var_group, key) %>% 
      spread(var_group_key, value)
    #> # A tibble: 1 x 16
    #>   `0_0_hp_mean_sd` `0_0_hp_n` `0_0_mpg_mean_s… `0_0_mpg_n` `0_1_hp_mean_sd`
    #>   <chr>            <chr>      <chr>            <chr>       <chr>           
    #> 1 194.167 (33.36)  12         15.05 (2.774)    12          180.833 (98.816)
    #> # … with 11 more variables: `0_1_hp_n` <chr>, `0_1_mpg_mean_sd` <chr>,
    #> #   `0_1_mpg_n` <chr>, `1_0_hp_mean_sd` <chr>, `1_0_hp_n` <chr>,
    #> #   `1_0_mpg_mean_sd` <chr>, `1_0_mpg_n` <chr>, `1_1_hp_mean_sd` <chr>,
    #> #   `1_1_hp_n` <chr>, `1_1_mpg_mean_sd` <chr>, `1_1_mpg_n` <chr>
    

    reprex package (v0.2.1) 于 2019 年 4 月 29 日创建

    【讨论】:

      猜你喜欢
      • 2019-09-22
      • 2020-07-10
      • 2015-10-17
      • 1970-01-01
      • 2016-04-20
      • 1970-01-01
      • 2018-07-03
      • 1970-01-01
      • 2021-03-25
      相关资源
      最近更新 更多