【发布时间】:2020-07-02 16:18:56
【问题描述】:
我有一个包含两列的数据框:
-
id -
string:这是一个文本字符串列,包含由符号/分隔的重复文本元素library(tidyverse) df_input <- data.frame(stringsAsFactors=FALSE, id = c(123, 234, 345, 456), string = c("[\"aaa\"] / [\"aaa\"] / [\"aaa\"] / bbb / bbb / bbb", "[\"hello hello\"] / [\"hello hello\"] / [\"hello hello\"] / [\"hello hello\"]", "my name is tim / my name is tim / my name is tim", "[\"hello word\"]") )
看起来像:
id string
1 123 ["aaa"] / ["aaa"] / ["aaa"] / bbb / bbb / bbb
2 234 ["hello hello"] / ["hello hello"] / ["hello hello"] / ["hello hello"]
3 345 my name is tim / my name is tim / my name is Tim
4 456 ["hello word"]
我看到的模式是每次有一组重复的元素,用符号/隔开:
["aaa"] / ["aaa"] / ["aaa"] / bbb / bbb / bbb
或者:
my name is tim / my name is tim / my name is Tim
但也有单个元素的情况:
["hello word"]
我想要一个如下所示的数据框:
df_output <- data.frame(stringsAsFactors=FALSE,
id = c(123, 234, 345, 456),
string = c("[\"aaa\"] / bbb", "[\"hello hello\"]", "my name is tim",
"[\"hello word\"]")
)
地点:
id string
1 123 ["aaa"] / bbb
2 234 ["hello hello"]
3 345 my name is tim
4 456 ["hello word"]
我只保留独特的元素;如果存在多个元素,则以/ 分隔。
dplyr有什么解决办法吗?
【问题讨论】: