【问题标题】:Mantain user defined meta data with customised functions for tm_map使用 tm_map 的自定义函数维护用户定义的元数据
【发布时间】:2014-01-31 14:56:00
【问题描述】:

我有一个函数,用于根据键/值字典翻译标记。

dictionary <- c("casa", "barco", "carro", "arbol")
names(dictionary) <- c("home", "boat", "car", "tree")

translate2 <- function (text, dictionary) {
  text_out <- character(0)
  for (i in 1:length(text)) {
    text.split <- strsplit(text[i], "\\s")
    translation <- dictionary[unlist(text.split)]
    text_out <- append(text_out, paste(translation, sep="", collapse=" "))
  }
  PlainTextDocument(text_out, id = ID(text), author = Author(text))
}

此功能适用于元 `Author:

library(tm)

text <- "My car is on the tree next to my home under the boat"
corpus <- Corpus(VectorSource(text))
meta(corpus, "Author", type="local") <- "Kant"
meta(corpus, "TextID", type="local") <- "121212"
meta(corpus[[1]], "Author")
# [1] "Kant"

corpus <- tm_map(corpus, translate2, dictionary)
meta(corpus[[1]], "Author")
# [1] "Kant" 
corpus[[1]]
# NA carro NA NA NA arbol NA NA NA casa NA NA barco

但是当我尝试使用稍微修改过的函数版本来传递用户定义的元数据时,例如 TextID

translate1 <- function (text, dictionary) {
  text_out <- character(0)
  for (i in 1:length(text)) {
    text.split <- strsplit(text[i], "\\s")
    translation <- dictionary[unlist(text.split)]
    text_out <- append(text_out, paste(translation, sep="", collapse=" "))
  }
  PlainTextDocument(text_out, id = ID(text), author = Author(text), 
                    TextID = TextID(text))
} 

我明白了

text <- "My car is on the tree next to my home under the boat"
corpus <- Corpus(VectorSource(text))
meta(corpus, "Author", type="local") <- "Kant"
meta(corpus, "TextID", type="local") <- "121212"
meta(corpus[[1]], "Author")
# [1] "Kant"
meta(corpus[[1]], "TextID")
# [1] "121212"

corpus <- tm_map(corpus, translate1, dictionary)
# Error in PlainTextDocument(text_out, id = ID(text), author = Author(text),  : 
#                              unused argument (TextID = TextID(text)) 

【问题讨论】:

    标签: r nlp tm


    【解决方案1】:

    您的方法存在一些问题:

    1. PlainTextDocument 没有参数 TextID (这导致了你的错误)
    2. 没有名为TextID的函数

    来自?PlainTextDocument,您要查找的参数似乎称为localmetadata

    这是translate1 的一个版本,似乎可以按预期工作:

    translate1 <- function (text, dictionary) {
      text_out <- character(0)
      for (i in 1:length(text)) {
        text.split <- strsplit(text[i], "\\s")
        translation <- dictionary[unlist(text.split)]
        text_out <- append(text_out, paste(translation, sep="", collapse=" "))
      }
      PlainTextDocument(text_out, id = ID(text), author = Author(text), 
                        localmetadata = list(TextID = meta(text, "TextID")))
    } 
    
    text <- "My car is on the tree next to my home under the boat"
    corpus <- Corpus(VectorSource(text))
    meta(corpus, "Author", type="local") <- "Kant"
    meta(corpus, "TextID", type="local") <- "121212"
    meta(corpus[[1]], "Author")
    # [1] "Kant"
    meta(corpus[[1]], "TextID")
    # [1] "121212"
    
    corpus <- tm_map(corpus, translate1, dictionary)
    meta(corpus[[1]], "Author")
    # [1] "Kant"
    meta(corpus[[1]], "TextID")
    # [1] "121212"
    corpus[[1]]
    # NA carro NA NA NA arbol NA NA NA casa NA NA barco
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2012-03-05
      • 2021-08-05
      • 2023-03-29
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多