【问题标题】:purrr: error when turning a nested list to a character vectorpurrr:将嵌套列表转换为字符向量时出错
【发布时间】:2020-01-20 22:37:04
【问题描述】:

我有一个包含一些重复条目的数据集,我想将其更改为仅包含唯一的值组合,其中 dup_num 列表示重复条目的数量,dup_rows 列表示哪些行包含重复项数据。

我实现了一个基于 Finding duplicate observations of selected variables in a tibble 的解决方案,但是当将包含行号列表的列中的数据强制转换为字符向量时,它会引发一堆警告。现在没问题,但我想用 DT 和 Shiny 显示这些数据,警告是这个应用程序的问题。

library(tidyverse)

df <- tibble(episode = 1:30,
             day = rep(c("Mon", "Wed", "Fri"), 10),
             name = rep(c(
               "Moe", "Larry", "Curly", "Shemp", "extra"
             ), 6))

chr_dups <- as_mapper( ~ str_c(.x) %>%
                         str_remove_all("[c\\(\\)]"))

df %>%
  nest(episode, .key = "dups") %>%
  mutate(dup_num = map_dbl(dups, nrow),
         dup_rows = map_chr(dups, chr_dups))
#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing

#> Warning in stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE):
#> argument is not an atomic vector; coercing
#> # A tibble: 15 x 5
#>    day   name  dups             dup_num dup_rows
#>    <chr> <chr> <list>             <dbl> <chr>   
#>  1 Mon   Moe   <tibble [2 x 1]>       2 1, 16   
#>  2 Wed   Larry <tibble [2 x 1]>       2 2, 17   
#>  3 Fri   Curly <tibble [2 x 1]>       2 3, 18   
#>  4 Mon   Shemp <tibble [2 x 1]>       2 4, 19   
#>  5 Wed   extra <tibble [2 x 1]>       2 5, 20   
#>  6 Fri   Moe   <tibble [2 x 1]>       2 6, 21   
#>  7 Mon   Larry <tibble [2 x 1]>       2 7, 22   
#>  8 Wed   Curly <tibble [2 x 1]>       2 8, 23   
#>  9 Fri   Shemp <tibble [2 x 1]>       2 9, 24   
#> 10 Mon   extra <tibble [2 x 1]>       2 10, 25  
#> 11 Wed   Moe   <tibble [2 x 1]>       2 11, 26  
#> 12 Fri   Larry <tibble [2 x 1]>       2 12, 27  
#> 13 Mon   Curly <tibble [2 x 1]>       2 13, 28  
#> 14 Wed   Shemp <tibble [2 x 1]>       2 14, 29  
#> 15 Fri   extra <tibble [2 x 1]>       2 15, 30

reprex package (v0.3.0) 于 2019 年 9 月 19 日创建

我很确定问题出在as_mapper()

以下是具有代表性玩具数据的代表。小标题描述了三个傀儡中的一些剧集、剧集的播出日期以及剧集的主角角色。

谢谢!

【问题讨论】:

    标签: r tidyverse purrr


    【解决方案1】:

    这是一个警告,因为 list 元素不是原子的,即它是 listtibble 可以识别的,如果我们 pull

    df %>%
      nest(dups = episode)  %>% 
      pull(dups)
    #<list_of<tbl_df<episode:integer>>[15]>
    #[[1]]
    # A tibble: 2 x 1
    #  episode
    #    <int>
    #1       1
    #2      16
    
    #[[2]]
    # A tibble: 2 x 1
    #  episode
    3    <int>
    #1       2
    #2      17
    # ...
    

    所以,它是tibblelist。或者我们可以使用pull 提取列

    或者我们可以flatten它并应用该功能

    library(purrr)
    df %>%
       nest(dups = episode) %>%
       mutate(dup_num = map_dbl(dups, nrow), 
             dup_rows = map(dups, ~ flatten_int(.x) %>% 
                                         chr_dups))
    

    注意:目前尚不清楚为什么函数 'chr_dups' 应用于数字的 'episode' 列。这些转换也没有意义


    如果我们只需要paste 'episode' 的元素按其他列分组,base R 单行方法是

    aggregate(episode~ day + name, df, toString)
    #   day  name episode
    #1  Fri Curly   3, 18
    #2  Mon Curly  13, 28
    #3  Wed Curly   8, 23
    #4  Fri extra  15, 30
    #5  Mon extra  10, 25
    #6  Wed extra   5, 20
    #7  Fri Larry  12, 27
    #8  Mon Larry   7, 22
    #9  Wed Larry   2, 17
    #10 Fri   Moe   6, 21
    #11 Mon   Moe   1, 16
    #12 Wed   Moe  11, 26
    #13 Fri Shemp   9, 24
    #14 Mon Shemp   4, 19
    #15 Wed Shemp  14, 29
    

    【讨论】:

    • 谢谢!我使用 chr_dups 列出单元格中的值,以便我可以使用 Shiny 应用程序显示它们或稍后写入 xls。优雅的 baseR 解决方案!
    【解决方案2】:

    我认为警告的来源已经得到解决。我要补充一点,你可以在没有映射的情况下做到这一点,只使用矢量化函数。

    library(tidyverse)
    
    df <- tibble(episode = 1:30,
                 day = rep(c("Mon", "Wed", "Fri"), 10),
                 name = rep(c(
                   "Moe", "Larry", "Curly", "Shemp", "extra"
                 ), 6))
    
    df %>%
      group_by(day, name) %>%
      summarise(
        dup_num = n(),
        dup_rows = str_c(episode, collapse = ", ")
      )
    #> # A tibble: 15 x 4
    #> # Groups:   day [3]
    #>    day   name  dup_num dup_rows
    #>    <chr> <chr>   <int> <chr>   
    #>  1 Fri   Curly       2 3, 18   
    #>  2 Fri   extra       2 15, 30  
    #>  3 Fri   Larry       2 12, 27  
    #>  4 Fri   Moe         2 6, 21   
    #>  5 Fri   Shemp       2 9, 24   
    #>  6 Mon   Curly       2 13, 28  
    #>  7 Mon   extra       2 10, 25  
    #>  8 Mon   Larry       2 7, 22   
    #>  9 Mon   Moe         2 1, 16   
    #> 10 Mon   Shemp       2 4, 19   
    #> 11 Wed   Curly       2 8, 23   
    #> 12 Wed   extra       2 5, 20   
    #> 13 Wed   Larry       2 2, 17   
    #> 14 Wed   Moe         2 11, 26  
    #> 15 Wed   Shemp       2 14, 29
    

    reprex package (v0.3.0) 于 2019 年 9 月 19 日创建

    【讨论】:

    • 更简单的方法,不需要 purrr!我的真实数据集是大约 50k 个相似长度的条目。我可以只加载dplyrstringr 并为自己节省一些开销,以便稍后在DTShiny 中渲染表。
    【解决方案3】:

    只是添加到其他海报。您不必使用purrr 来实现您想要的。 Base R 可以。

    df <- df %>%
      nest(episode, .key = "dups") %>%
      mutate(dup_num = sapply(dups, nrow),
             dup_rows = sapply(dups, function(x) paste0(x$episode, collapse = ",")))
    

    【讨论】:

      猜你喜欢
      • 2016-02-08
      • 2019-04-24
      • 1970-01-01
      • 1970-01-01
      • 2020-09-03
      • 2016-04-10
      • 2021-11-25
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多