【发布时间】:2021-09-23 08:10:14
【问题描述】:
我正在尝试提取艺术家和标题名称。然而它有点复杂。 这是清单;
nlist <- c(
"Lil' SlimLil' Slim feat. PxMxWxPxMxWx Where Your Ward At!!",
"I Like It (Mannie Fresh Style)I Like It (Mannie Fresh Style)Ms. Tee",
"Bella VistaBella Vista Mister Wong",
"Tom WareTom WareChina Town",
"Race 'N RhythmRace 'N Rhythm Teenage Girls",
"Ronald MarquisseRonald MarquisseElectro Link 7",
"PleasurePleasure Thoughts Of Old Flames",
"OM, OM, Dom Um RomaoDom Um Romao Chipero",
"HookfaceHookface4 07 181221"
)
这是字符串中的模式。
说明:
- 共有三种不同的模式(1、2-7、8)。
- RED:艺术家(重复),
- 蓝色:标题(不重复),
- GREEN:连词(艺术家姓名之间不可重复)
1 和 8 非常难,我无法解决。但是对于下面的 2 到 7 个代码解决了我的问题。
title = str_trim(gsub('(.+?)\\1','', nlist))
artist = re.match('(.+?)\\1', nlist)[,2]
data = cbind(title,artist);data
这里是上述代码的输出。
title artist
[1,] "feat. PxMxWxPxMxWx Where Your Ward At!!" "Lil' Slim"
[2,] "Ms. Tee" "I Like It (Mannie Fresh Style)"
[3,] "Mister Wong" "Bella Vista"
[4,] "China Town" "Tom Ware"
[5,] "Teenage Girls" "Race 'N Rhythm"
[6,] "Electro Link 7" "Ronald Marquisse"
[7,] "Thoughts Of Old Flames" "Pleasure"
[8,] "Chipero" "OM, "
[9,] "4 07 181221" "Hookeface"
问题:当有“壮举”时。或字符串中的“,”将字符串的重复序列截断。 问题:如何才能真正提取出如下艺术家姓名?
我的预期结果在这里(检查 1 和 8);
title artist
[1,] "Where Your Ward At!!" "Lil' Slim feat. PxMxWx"
[2,] "Ms. Tee" "I Like It (Mannie Fresh Style)"
[3,] "Mister Wong" "Bella Vista"
[4,] "China Town" "Tom Ware"
[5,] "Teenage Girls" "Race 'N Rhythm"
[6,] "Electro Link 7" "Ronald Marquisse"
[7,] "Thoughts Of Old Flames" "Pleasure"
[8,] "Chipero" "OM, Dom Um Romao"
[9,] "4 07 181221" "Hookeface"
谢谢...
【问题讨论】:
-
不应该是
4 07 181221title 和Hookefaceartist 吗?
标签: r regex gsub text-extraction