【问题标题】:when checking target: expected dense_2 to have shape (1,) but got array with shape (2,)检查目标时:期望dense_2具有形状(1,)但得到形状为(2,)的数组
【发布时间】:2019-06-13 20:11:28
【问题描述】:

用 csv 进行的情绪分析包含 45k 和两个 cols[text,sentiment],尝试使用带有 binary_crossentropy 的 sigmoid 但它返回错误:

检查目标时出错:预期 dense_2 的形状为 (1,) 但 得到形状为 (2,) 的数组

我曾尝试使用 LabelEncoder ,但它返回,输入形状错误,我如何让 Sigmond 1 密集的编码标签可接受?

#I do aspire here to have balanced classes
num_of_categories = 45247
shuffled = data.reindex(np.random.permutation(data.index))
e = shuffled[shuffled['sentiment'] == 'POS'][:num_of_categories]
b = shuffled[shuffled['sentiment'] == 'NEG'][:num_of_categories]
concated = pd.concat([e,b], ignore_index=True)
for idx,row in data.iterrows():
    row[0] = row[0].replace('rt',' ')
#Shuffle the dataset
concated = concated.reindex(np.random.permutation(concated.index))
concated['LABEL'] = 0

#encode the lab
encoder = LabelEncoder()
concated.loc[concated['sentiment'] == 'POS', 'LABEL'] = 0
concated.loc[concated['sentiment'] == 'NEG', 'LABEL'] = 1
print(concated['LABEL'][:10])
labels = encoder.fit_transform(concated)
print(labels[:10])
if 'sentiment' in concated.keys():
    concated.drop(['sentiment'], axis=1)

n_most_common_words = 8000
max_len = 130
tokenizer = Tokenizer(num_words=n_most_common_words, filters='!"#$%&()*+,-./:;<=>?@[\]^_`{|}~', lower=True)
tokenizer.fit_on_texts(concated['text'].values)
sequences = tokenizer.texts_to_sequences(concated['text'].values)
word_index = tokenizer.word_index

【问题讨论】:

  • 错误发生在哪一行?

标签: python-3.x encoding deep-learning sentiment-analysis text-classification


【解决方案1】:

LabelEncoder 的输出如果也是 1 个暗淡,我猜你的网络的输出有两个暗淡。所以你需要 one-hot 你的 y_true。

使用

labels = keras.utils.to_categorical(concated['LABEL'], num_classes=2)

改为

labels = encoder.fit_transform(concated)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-09-03
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2020-06-16
    相关资源
    最近更新 更多