无法获取由逗号分隔的单行中的数据，该行由另一列值分组答案

【问题标题】：failed to get data in single row separated by comma that is grouped by another column values无法获取由逗号分隔的单行中的数据，该行由另一列值分组
【发布时间】：2018-07-21 22:00:28
【问题描述】：

我有一个包含许多变量的数据框，其中两个变量显示在示例数据集test 的以下代码中：

test <- data.frame(row_numb = c(1,  1,  1,  1,  1,  1,  1,  2,  2,  2,  3,  3,  3,  3,  3,  3,  3,  3),
                   words = c('apply','assistance','benefit','compass','medical','online','renew','meet','service','website','center','country','country','develop','highly','home','major','obtain'))

我正在尝试将单词列中的单词加入一个新的数据框fdata 和列Dictionary，按row_numb 分组，并使用以下代码以, 逗号分隔：

fdata <- test %>% 
    select(row_numb, words) %>% 
    group_by(row_numb) %>% 
    unite(Dictionary, words, sep=",")

我无法得到我期望的结果：

 row_numb   Dictionary
 1          apply, assistance, benefit, compass, medical, online, renew
 2          meet, service.... and so forth

有人可以帮助找出我正在做的错误吗？

【问题讨论】：

test %>% group_by(row_numb) %>% summarise(word = toString(words)); unite 是将多列粘贴在一起。
谢谢。有效。我会要求您为这两个方面添加一些示例，并为社区提供一些解释。

标签： r dplyr tidyr tidytext

【解决方案1】：

unite 用于将多列粘贴在一起，而不是用于聚合一列。为此，将summarise 与paste(..., collapse = ', ') 一起使用，或者对于逗号分隔字符串的特殊情况，toString：

library(tidyverse)

test <- data.frame(row_numb = c(1,  1,  1,  1,  1,  1,  1,  2,  2,  2,  3,  3,  3,  3,  3,  3,  3,  3),
                   words = c('apply','assistance','benefit','compass','medical','online','renew','meet','service','website','center','country','country','develop','highly','home','major','obtain'))

test %>% group_by(row_numb) %>% summarise(words = toString(words))
#> # A tibble: 3 x 2
#>   row_numb words                                                         
#>      <dbl> <chr>                                                         
#> 1        1 apply, assistance, benefit, compass, medical, online, renew   
#> 2        2 meet, service, website                                        
#> 3        3 center, country, country, develop, highly, home, major, obtain

要使用unite，请指定新列的名称，以及应粘贴在一起的列，可选择使用sep 参数，例如

iris %>% unite(sepal_l_w, Sepal.Length, Sepal.Width, sep = ' / ') %>% head()
#>   sepal_l_w Petal.Length Petal.Width Species
#> 1 5.1 / 3.5          1.4         0.2  setosa
#> 2   4.9 / 3          1.4         0.2  setosa
#> 3 4.7 / 3.2          1.3         0.2  setosa
#> 4 4.6 / 3.1          1.5         0.2  setosa
#> 5   5 / 3.6          1.4         0.2  setosa
#> 6 5.4 / 3.9          1.7         0.4  setosa

【讨论】：

【解决方案2】：

另一种适用于此类任务的通用模式是nest()，然后是mutate()/map()，如果您接下来需要执行的特定任务没有像toString() 这样符合要求的功能.它仍然只是一个三行：首先nest() 您的数据，然后展平列表结构，然后将其粘贴/折叠在一起。

library(tidyverse)

test %>%
  nest(-row_numb) %>%
  mutate(Dictionary = map(data, unlist),
         Dictionary = map_chr(Dictionary, paste, collapse = ", "))

#> # A tibble: 3 x 3
#>   row_numb data           Dictionary                                      
#>      <dbl> <list>         <chr>                                           
#> 1        1 <tibble [7 × … apply, assistance, benefit, compass, medical, o…
#> 2        2 <tibble [3 × … meet, service, website                          
#> 3        3 <tibble [8 × … center, country, country, develop, highly, home…

由reprex package (v0.2.0) 于 2018 年 8 月 14 日创建。

【讨论】：

太棒了。非常感谢朱莉娅。