【发布时间】:2020-04-08 00:17:30
【问题描述】:
我想取一个代表对话的tibble,把它变成一个.txt,可以在文本编辑器中手动编辑,然后返回一个tibble进行处理。
我遇到的主要挑战是分离文本块,以便在编辑后可以将它们重新导入为类似的格式,同时保留“发言人”的名称。
速度很重要,因为文件的数量和每个文本段的长度都很大。
这是输入小标题:
tibble::tribble(
~word, ~speakerTag,
"been", 1L,
"going", 1L,
"on", 1L,
"and", 1L,
"what", 1L,
"your", 1L,
"goals", 1L,
"are.", 1L,
"Yeah,", 2L,
"so", 2L,
"so", 2L,
"John", 2L,
"has", 2L,
"15", 2L
)
这是 .txt 中所需的输出:
###Speaker 1###
been going on and what your goals are.
###Speaker 2###
Yeah, so so John has 15
这是手动更正错误后所需的回报:
~word, ~speakerTag,
"been", 1L,
"going", 1L,
"on", 1L,
"and", 1L,
"what", 1L,
"your", 1L,
"goals", 1L,
"in", 1L,
"r", 1L,
"Yeah,", 2L,
"so", 2L,
"so", 2L,
"John", 2L,
"hates", 2L,
"50", 2L
)
【问题讨论】:
标签: r dplyr purrr stringr google-language-api