【问题标题】:ValueError: Data cardinality is ambiguous with tf.kerasValueError:数据基数与 tf.keras 不明确
【发布时间】:2021-06-29 22:51:21
【问题描述】:

我有一个包含两列的数据框;第一个包含一个句子,第二个是目标标签(总共 9 个 - 句子可以分类到多个标签)。

我使用 word2vec 对文本进行矢量化处理,结果生成了一个长度为 64 的数组。

我最初遇到的问题

Tensorflow - ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float)

为了克服这个问题,我将 np.array 转换为

train_inputs = tf.convert_to_tensor([df_train_title_train])

但现在我遇到了一个新问题 - 见下文。

我这几天一直在研究 stackflow 和其他资源,并且正在努力让我的简单神经网络工作。

print(train_inputs.shape)
print(train_targets.shape)
print(validation_inputs.shape)
print(validation_targets.shape)
print(train_inputs[0].shape)
print(train_targets[0].shape)

(1, 63586, 64)
(63586, 9)
(1, 7066, 64)
(7066, 9)
(63586, 64)
(9,)



# Set the input and output sizes
input_size = 64
output_size = 9
# Use same hidden layer size for both hidden layers. Not a necessity.
hidden_layer_size = 64

# define how the model will look like
model = tf.keras.Sequential([
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 1st hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer
    tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])


# model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])


### Training
# That's where we train the model we have built.

# set the batch size
batch_size = 10

# set a maximum number of training epochs
max_epochs = 10


# fit the model
# note that this time the train, validation and test data are not iterable
model.fit(train_inputs, # train inputs
          train_targets, # train targets
          batch_size=batch_size, # batch size
          epochs=max_epochs, # epochs that we will train for (assuming early stopping doesn't kick in)
          validation_data=(validation_inputs, validation_targets), # validation data
          verbose = 2 # making sure we get enough information about the training process
          )  

错误信息

/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/data_adapter.py in _check_data_cardinality(data)
   1527           label, ", ".join(str(i.shape[0]) for i in nest.flatten(single_data)))
   1528     msg += "Make sure all arrays contain the same number of samples."
-> 1529     raise ValueError(msg)
   1530 
   1531 

ValueError: Data cardinality is ambiguous:
  x sizes: 1
  y sizes: 63586
Make sure all arrays contain the same number of samples.

【问题讨论】:

  • 如果没有示例,很难给出明确的答案。我会首先添加 Flatten 层作为输入层。其次,我会将train_targets 形状转换为train_inputs 的形状。
  • train_inputs.shape(1, 63586, 64) 但 train_targets.shape 是 (63586, 9),应该是 (1, 63586, 9)

标签: python tensorflow keras neural-network tf.keras


【解决方案1】:

您不会在任何地方设置输入的形状;您应该在模型的开头使用明确的Input 层来执行此操作(参见docs 中的示例):

# before the first Dense layer:
tf.keras.Input(shape=(64,))

或通过在第一层中包含 input_shape 参数:

tf.keras.layers.Dense(hidden_layer_size, activation='relu', input_shape=(64,)), # 1st hidden layer

很可能,您甚至不需要convert_to_tensor(虽然不太确定)。

另外,与您的问题无关,但由于您处于多类设置中,您应该使用loss='categorical_crossentropy' binary_crossentropy;见Why binary_crossentropy and categorical_crossentropy give different performances for the same problem?

【讨论】:

    猜你喜欢
    • 2020-09-26
    • 1970-01-01
    • 2021-12-20
    • 2021-02-07
    • 1970-01-01
    • 2023-03-23
    • 2020-10-17
    • 2021-07-09
    • 1970-01-01
    相关资源
    最近更新 更多