【问题标题】:Extract actions on objects from a sentence in R从R中的句子中提取对象的动作
【发布时间】:2018-03-08 13:23:51
【问题描述】:

我想从 R 中的句子列表中提取对对象执行的操作。给出一个小的概述。

S = “The boy opened the box. He took the chocolates. He ate the chocolates. 
     He went to school”

我正在寻找如下组合:

Opened box
Took chocolates
Ate chocolates
Went school

我已经能够分别提取动词和名词。但无法想出一种方法将它们结合起来以获得这样的洞察力。

library(openNLP)
library(openNLPmodels.en)
library(NLP)

s = as.String("The boy opened the box. He took the chocolates. He ate the 
               chocolates. He went to school")

tagPOS<-  function(x, ...) {
s <- as.String(x)
word_token_annotator<- Maxent_Word_Token_Annotator()
a2 <- Annotation(1L, "sentence", 1L, nchar(s))
a2 <- annotate(s, word_token_annotator, a2)
a3 <- annotate(s, Maxent_POS_Tag_Annotator(), a2)
a3w <- a3[a3$type == "word"]
POStags<- unlist(lapply(a3w$features, `[[`, "POS"))
POStagged<- paste(sprintf("%s/%s", s[a3w], POStags), collapse = ",")
list(POStagged = POStagged, POStags = POStags)
}

nouns = c("/NN", "/NNS","/NNP","/NNPS")
verbs = c("/VB","/VBD","/VBG","/VBN","/VBP","/VBZ")

s = tolower(s)
s = gsub("\n","",s)
s = gsub('"',"",s)

tags = tagPOS(s)
tags = tags$POStagged
tags = unlist(strsplit(tags, split=","))

nouns_present = tags[grepl(paste(nouns, collapse = "|"), tags)]
nouns_present = unique(nouns_present)
verbs_present = tags[grepl(paste(verbs, collapse = "|"), tags)]
verbs_present = unique(verbs_present)
nouns_present<- gsub("^(.*?)/.*", "\\1", nouns_present)
verbs_present = gsub("^(.*?)/.*", "\\1", verbs_present)
nouns_present = 
paste("'",as.character(nouns_present),"'",collapse=",",sep="")
verbs_present = 
paste("'",as.character(verbs_present),"'",collapse=",",sep="")

这个想法是建立一个网络图,点击一个动词节点,所有连接到它的对象都会出现,反之亦然。 对此的任何帮助都会很棒。

【问题讨论】:

    标签: r nlp opennlp


    【解决方案1】:

    我假设您还想在关键动作动词之前和之后获取单词。我能够通过使用tidytext 包来实现这一点。 (参考链接:https://uc-r.github.io/word_relationships

    library(tidytext)
    library(tidyverse)
    
    #first create another column with divided up text strings by n(i set as every two words paired together)
    mydf <-unnest_tokens(comments, "tokens", Response, token = "ngrams", n=2, to_lower = TRUE, drop = FALSE)
    
    #remove stopwords:
    mydf %>%
      separate(tokens, c("word1", "word2"), sep = " ") %>%
      filter(!word1 %in% stop_words$word,
             !word2 %in% stop_words$word,
             ) %>%
      count(word1, word2, sort = TRUE) %>% view()
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-08-12
      • 2018-12-01
      • 2020-09-06
      • 2012-04-12
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多