【发布时间】:2017-09-02 23:44:30
【问题描述】:
我有一个文本列,其中包含客户和代理之间电话通话的语音到文本记录。在对原始文本值进行一些文本操作之后,假设我有一个如下所示的向量:
text <- " customer:customer text1 agent:agent text 1 customer:customer text2 agent:agent text 2"
(注意向量文本开头的空格。)
问题:如何从原始源字段(在本例中为text 向量)将客户和代理文本提取到两个单独的字段中?
# desired outputs:
# field for customer texts
"customer text1, customer text2"
# field for agent texts
"agent text1, agent text2"
到目前为止,我能做的(在正则表达式方面的经验有限)是:
customerText <- gsub("^ customer:| agent:(.*)", "", text)
customerText
[1] "customer text1"
编辑:
请考虑下面基于数据帧的方法的可重现代码,而不是上面基于向量的代码。
> callid <- c("1","2")
> conversation <- c(" customer:customer text 1 agent:agent text 1 customer:customer text 2 agent:agent text 2",
+ " agent:agent text 8 customer:customer text 8 agent:agent text 9 customer:customer text 9")
> conversationCustomer <- c("customer text 1, customer text 2", "customer text 8, customer text 9")
> conversationAgent <- c("agent text 1, agent text 2", "agent text 8, agent text 9")
> df <- data.frame(callid, conversation)
> dfDesired <- data.frame(callid, conversation, conversationCustomer, conversationAgent)
> rm(callid, conversation, conversationCustomer, conversationAgent)
>
> df
callid conversation
1 1 customer:customer text 1 agent:agent text 1 customer:customer text 2 agent:agent text 2
2 2 agent:agent text 8 customer:customer text 8 agent:agent text 9 customer:customer text 9
> dfDesired
callid conversation conversationCustomer conversationAgent
1 1 customer:customer text 1 agent:agent text 1 customer:customer text 2 agent:agent text 2 customer text 1, customer text 2 agent text 1, agent text 2
2 2 agent:agent text 8 customer:customer text 8 agent:agent text 9 customer:customer text 9 customer text 8, customer text 9 agent text 8, agent text 9
谢谢!
【问题讨论】:
-
R 用于文本解析?上帝保佑你。