【发布时间】:2021-03-16 08:19:34
【问题描述】:
我想明确替换 quanteda 包的 tokens 类的对象中定义的特定标记。我无法复制一种适用于 stringr 的标准方法。
目标是将"XXXof" 形式的所有标记替换为c("XXX", "of") 形式的两个标记。
请看下面的最小值:
suppressPackageStartupMessages(library(quanteda))
library(stringr)
text = "It was a beautiful day down to the coastof California."
# I would solve this with stringr as follows:
text_stringr = str_replace( text, "(^.*?)(of)", "\\1 \\2" )
text_stringr
#> [1] "It was a beautiful day down to the coast of California."
# I fail to find a similar solution with quanteda that works on objects of class tokens
tok = tokens( text )
# I want to replace "coastof" with "coast"
tokens_replace( tok, "(^.*?)(of)", "\\1 \\2", valuetype = "regex" )
#> Tokens consisting of 1 document.
#> text1 :
#> [1] "It" "was" "a" "beautiful" "day"
#> [6] "down" "to" "the" "\\1 \\2" "California"
#> [11] "."
有什么解决方法吗?
由reprex package (v1.0.0) 于 2021-03-16 创建
【问题讨论】: