【问题标题】:R to_categorical for sequence labeling with KerasR to_categorical 用于使用 Keras 进行序列标记
【发布时间】:2018-11-10 11:30:34
【问题描述】:

我有一个这样定义的数据框 我正在尝试为深度学习问题创建序列标记输入。 所以我为每个句子元素都有标签,我为句子元素创建 WordIndex 向量,将它们填充 10 个维度,对句子元素的标签执行相同的操作(为标签创建 TagIndex,将它们填充到 10方面)。 然后我需要将 TagIndices 转换为分类变量。那就是错误出现的时候。任何帮助都会很棒。这是正确的方法吗?

SentenceID = c(1,1,1,1,2,2,2,3,3,3,3,3,3,3,3)
Tokens = c("I","went","to","school","nobody","can","find","some","people","know","what","they","are","doing","now")
WordIndex = c(3,4,7,8,9,10,12,54,34,66,33,89,87,23,22)
TagIndex = c(1,3,2,4,1,3,4,1,2,4,3,4,2,3,4)

df = data.frame(SentenceID, Tokens, WordIndex, TagIndex)

lst <- split(df$WordIndex, f = df$SentenceID)

lstWord2 <- lapply(lst, function(x){
  if (length(x) < 10){
    x2 <- c(x, rep(0, 10 - length(x)))
  }
  return(x2)
})

lstTag <- split(df$TagIndex, f = df$SentenceID)

lstTag2 <- lapply(lstTag, function(x){
  if (length(x) < 10){
    x2 <- c(x, rep(0, 10 - length(x)))
  }
  return(x2)
})

is.vector(lstTag2)

y <- to_categorical(lstTag2, num_classes = NULL)

我得到的错误是这样的。

Error in py_call_impl(callable, dots$args, dots$keywords) : 
  TypeError: int() argument must be a string, a bytes-like object or a number, not 'dict'

Detailed traceback: 
  File "C:\Users\balak\AppData\Local\conda\conda\envs\R-TENS~1\lib\site-packages\keras\utils\np_utils.py", line 22, in to_categorical
    y = np.array(y, dtype='int')

【问题讨论】:

  • 我认为这是因为lstTag2 是一个列表,因为它是从lapply 调用返回的
  • 如果您使用sparse_categorical_crossentropy loss,您可以完全避免执行此步骤,因为它假定您提供整数标签,它会为您处理转换。
  • Nuric 和 Relasta,非常感谢。 Nuric,我会试一试的。并更新是否可以解决问题。您是否遇到过任何处理 bi-lstm crf 以进行序列建模的 R 代码。可以想象,我想使用 Keras 为我预测标签,基于输入的句子和相关的标签

标签: r keras sequence labeling


【解决方案1】:

我猜 to_categorical 函数要求输入是一个矩阵,这样做可以使它工作:

SentenceID = c(1,1,1,1,2,2,2,3,3,3,3,3,3,3,3)
Tokens = c("I","went","to","school","nobody","can","find","some","people","know","what","they","are","doing","now")
WordIndex = c(3,4,7,8,9,10,12,54,34,66,33,89,87,23,22)
TagIndex = c(1,3,2,4,1,3,4,1,2,4,3,4,2,3,4)

df = data.frame(SentenceID, Tokens, WordIndex, TagIndex)

lst <- split(df$WordIndex, f = df$SentenceID)

lstWord2 <- lapply(lst, function(x){
  if (length(x) < 10){
    x2 <- c(x, rep(0, 10 - length(x)))
  }
  return(x2)
})

lstTag <- split(df$TagIndex, f = df$SentenceID)

lstTag2 <- lapply(lstTag, function(x){
  if (length(x) < 10){
    x2 <- c(x, rep(0, 10 - length(x)))
  }
  return(x2)
})


y <- to_categorical(as.matrix(lstTag2), num_classes = NULL)

我知道了:

> y
, , 1

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    0    0    0    0    1    1    1    1    1     1
[2,]    0    0    0    1    1    1    1    1    1     1
[3,]    0    0    0    0    0    0    0    0    1     1

, , 2

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    0    0    0    0    0    0    0    0     0
[2,]    1    0    0    0    0    0    0    0    0     0
[3,]    1    0    0    0    0    0    0    0    0     0

, , 3

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    0    0    1    0    0    0    0    0    0     0
[2,]    0    0    0    0    0    0    0    0    0     0
[3,]    0    1    0    0    0    1    0    0    0     0

, , 4

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    0    1    0    0    0    0    0    0    0     0
[2,]    0    1    0    0    0    0    0    0    0     0
[3,]    0    0    0    1    0    0    1    0    0     0

, , 5

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    0    0    0    1    0    0    0    0    0     0
[2,]    0    0    1    0    0    0    0    0    0     0
[3,]    0    0    1    0    1    0    0    1    0     0

【讨论】:

    猜你喜欢
    • 2018-06-27
    • 2016-09-15
    • 2018-07-02
    • 2018-12-23
    • 1970-01-01
    • 2023-04-09
    • 1970-01-01
    • 2017-10-25
    • 1970-01-01
    相关资源
    最近更新 更多