错误：valueError：输入数组应具有与目标数组相同数量的样本。查找 1 个输入样本和 0 个目标样本答案

【问题标题】：Error: valueError: input arrays should have the same number of samples as target arrays. Find 1 input samples and 0 target samples错误：valueError：输入数组应具有与目标数组相同数量的样本。查找 1 个输入样本和 0 个目标样本
【发布时间】：2019-07-15 08:10:06
【问题描述】：

我正在尝试执行系统调用分类任务。下面的代码灵感来自一个文本分类项目。我的系统调用表示为 1 到 340 之间的整数序列。我得到的错误是：

**valueError: input arrays should have the same number of samples as target arrays. Find 1 input samples and 0 target samples**. 我不知道该怎么办，因为这是我的第一次提前谢谢你

       df = pd.read_csv("data.txt") 
       df_test = pd.read_csv("validation.txt")
      #split arrays into train and test data (cross validation)
        train_text, test_text, train_y, test_y = train_test_split(df,df,test_size = 0.2)
    MAX_NB_WORDS = 5700
   # get the raw text data
    texts_train = train_text.astype(str)
    texts_test = test_text.astype(str)
    # finally, vectorize the text samples into a 2D integer tensor
   tokenizer = Tokenizer(nb_words=MAX_NB_WORDS, char_level=False)
   tokenizer.fit_on_texts(texts_train)
   sequences = tokenizer.texts_to_sequences(texts_train)
   sequences_test = tokenizer.texts_to_sequences(texts_test)

  word_index = tokenizer.word_index
  type(tokenizer.word_index), len(tokenizer.word_index)
  index_to_word = dict((i, w) for w, i in tokenizer.word_index.items()) 
 " ".join([index_to_word[i] for i in sequences[0]])
  seq_lens = [len(s) for s in sequences]

  MAX_SEQUENCE_LENGTH = 100
 # pad sequences with 0s
 x_train = pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH) 
 x_test = pad_sequences(sequences_test, maxlen=MAX_SEQUENCE_LENGTH)
 #print('Shape of data train:', x_train.shape)  #cela a donnée (1,100)
 #print('Shape of data test tensor:', x_test.shape)
 y_train = train_y
 y_test = test_y
 print('Shape of label tensor:', y_train.shape)
 EMBEDDING_DIM = 32
 N_CLASSES = 2

y_train = keras.utils.to_categorical( y_train , N_CLASSES )
sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype='float32')

embedding_layer = Embedding(MAX_NB_WORDS, EMBEDDING_DIM,
                        input_length=MAX_SEQUENCE_LENGTH,
                        trainable=True)
embedded_sequences = embedding_layer(sequence_input)

average = GlobalAveragePooling1D()(embedded_sequences)
 predictions = Dense(N_CLASSES, activation='softmax')(average)

 model = Model(sequence_input, predictions)
 model.compile(loss='categorical_crossentropy',
          optimizer='adam', metrics=['acc'])
 model.fit(x_train, y_train, validation_split=0.1,
      nb_epoch=10, batch_size=1)
  output_test = model.predict(x_test)
  print("test auc:", roc_auc_score(y_test,output_test[:,1]))

【问题讨论】：

标签： python keras neural-network nlp

【解决方案1】：

错误提示：

x_train.shape[0] != y_train.shape[0]

您需要检查数据准备过程并确保您传递给fit 函数的数据数组的第一维相同。换句话说，输入数组的样本数应该与目标数组相同。

【讨论】：

我无法理解问题出在哪里，但是当我删除此行时：#y_train = keras.utils.to_categorical(y_train, N_CLASSES) 错误更改为：ValueError：检查目标时出错：预期 dence_1形状为 (2,)，但得到了形状为 (1,) 的数组。这意味着，形状又存在问题
是的，当我打印它们时。我发现：y_train.shape = 0 和 x_train.shape = 1，这意味着它们具有不同的形状。但是我该如何解决这个问题。用同样的方法在这个石灰中定义它们：train_text, test_text, train_y, test_y = train_test_split(df,df,test_size = 0.2) 谢谢你，