【问题标题】:R Combining strings between known stringsR在已知字符串之间组合字符串
【发布时间】:2021-12-21 08:33:23
【问题描述】:

我有一长串具有特定结构的字符串向量。我想组合字符串并揭示这种结构。一个例子将清除这一点。

chr_vec <- c("Random Title", "Start", "dsf", "sdvf", "Stop", "Start", "dsf", "sdvf", "Stop", "Start", "dsf", "sdvf", "Stop", "Another Random Title", "Start", "erg", "vdf", "vfd", "efw", "Stop",
             "Start", "erg", "vdf", "vfd", "efw", "Stop", "Start", "erg", "vdf", "vfd", "efw", "Stop")

所以我有随机标题,但开始 - 停止之间的单词(包含的应该组合在一起。应该包含随机标题,所以我知道属于哪个块结构。结果会是这样的:

result <- list("Random Title" = list(c("Start", "dsf", "sdvf", "Stop"), c("Start", "dsf", "sdvf", "Stop")),
+                "Another Random Title" = list(c("Start", "erg", "vdf", "vfd", "efw", "Stop"), c("Start", "erg", "vdf", "vfd", "efw", "Stop"), c("Start", "erg", "vdf", "vfd", "efw", "Stop")))
> result
$`Random Title`
$`Random Title`[[1]]
[1] "Start" "dsf"   "sdvf"  "Stop" 

$`Random Title`[[2]]
[1] "Start" "dsf"   "sdvf"  "Stop" 


$`Another Random Title`
$`Another Random Title`[[1]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop" 

$`Another Random Title`[[2]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop" 

$`Another Random Title`[[3]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop" 

我不确定 START-STOP 之间有多少个字符串。标题是随机的。我的数据格式不需要是矢量。我通过 tibble 和 cumsum 尝试了这个,但失败了,因为我需要这些标题。

我的努力:

res <- tibble(text = chr_vec) %>% 
  mutate(group = cumsum(text == "Start"))

这几乎可行,但那些标题正在搞乱这种方法。他们将被错误识别。

【问题讨论】:

  • 如果标题是随机的,您将如何识别它们?如果标题是“efw”怎么办?
  • 嗯,我希望标题总是在这个序列中停止 - 标题 - 开始,它会注意到有二级列表开始。因此,如果我们可以首先结合 START - STOP 之间的所有内容,那么我们可以结合可能的长度,并看到这些实际上是在新标题之前反映其背后的所有内容的标题。因为标题从不在 START-STOP 序列之间。

标签: r list


【解决方案1】:

基础 R 中的一个解决方案

t1=grep("Start",chr_vec)
t2=grep("Stop",chr_vec)
sek=mapply(seq,t1,t2)

j=1
lst=list()
for (i in 1:length(sek)) {
  
  if (i==1) {
    tit=chr_vec[1]
  } else {
    if ((head(sek[[i]],1)-tail(sek[[i-1]],1))!=1) {
      tit=chr_vec[head(sek[[i]],1)-1]
      j=1
    }
  }
  
  lst[[tit]][[j]]=chr_vec[sek[[i]]]
  j=j+1
}

导致

$`Random Title`
$`Random Title`[[1]]
[1] "Start" "dsf"   "sdvf"  "Stop" 

$`Random Title`[[2]]
[1] "Start" "dsf"   "sdvf"  "Stop" 

$`Random Title`[[3]]
[1] "Start" "dsf"   "sdvf"  "Stop" 


$`Another Random Title`
$`Another Random Title`[[1]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop" 

$`Another Random Title`[[2]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop" 

$`Another Random Title`[[3]]
[1] "Start" "erg"   "vdf"   "vfd"   "efw"   "Stop"

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2013-07-25
    • 2011-01-30
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-05-03
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多