基于 R 中的另一列创建从列中提取的列分组字符串文本答案

【问题标题】：Create a column grouping strings text extracted from a column based on another column in R基于 R 中的另一列创建从列中提取的列分组字符串文本
【发布时间】：2020-04-23 23:12:48
【问题描述】：

这是我的数据集

id   text
 1    "red"
 1    "blue"
 2    "light blue"
 2    "red"
 2    "yellow"
 3    "dark green"

这是我想要得到的结果：

 id  text2
 1   "red, blue"
 2  "light blue, red, yellow"
 3  "dark green"

基本上我需要将“文本”列中的文本与逗号放在一起以分隔不同的元素

【问题讨论】：

标签： r regex string text

【解决方案1】：

我们可以使用dplyr

library(dplyr)
df1 %>%
    group_by(id) %>%
    summarise(text2 = toString(text))

数据

df1 <- structure(list(id = c(1L, 1L, 2L, 2L, 2L, 3L), text = c("red", 
"blue", "light blue", "red", "yellow", "dark green")), row.names = c(NA, 
-6L), class = "data.frame")

【讨论】：

【解决方案2】：

使用aggregate 和toString。

aggregate(. ~ id, d, toString)
#   id                    text
# 1  1               red, blue
# 2  2 light blue, red, yellow
# 3  3              dark green

注意：这不适用于因子列，即如果 is.factor(d$text) 产生 TRUE 您需要稍微不同的方法。示范：

d$text <- as.factor(d$text)  # make 
is.factor(d$text)
#  [1] TRUE

做：

aggregate(. ~ id, transform(d, text=as.character(text)), toString)

数据：

d <- structure(list(id = c(1L, 1L, 2L, 2L, 2L, 3L), text = c("red", 
"blue", "light blue", "red", "yellow", "dark green")), row.names = c(NA, 
-6L), class = "data.frame")

【讨论】：

我不确定如何转换我的数据框{看起来像这个 id
您的专栏似乎属于"factor" 类。请参阅我的答案的编辑。不过，您也可以使用cbind.data.frame(id, text, stringsAsFactors=FALSE) 来预先防止因素。（structure(.) 只是 dput(d) 的输出，这是我们在 Stack Overflow 上共享数据的方式，请参阅 stackoverflow.com/questions/5963269/…）