TypeError: Unable to build `Dense` layer with non-floating point dtype <dtype: 'string'> : My data has no String type data in it答案

【问题标题】：TypeError: Unable to build `Dense` layer with non-floating point dtype <dtype: 'string'> : My data has no String type data in itTypeError: Unable to build `Dense` layer with non-floating point dtype <dtype: 'string'> : My data has no String type data in it
【发布时间】：2020-12-16 01:13:01
【问题描述】：

我正在尝试构建用于分类的神经网络。我对所有数据进行了预处理，它看起来像这样： 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 2 强> 0.12436986167881312 -0.426405420419126 1

虽然一切看起来都不错，数据类型是 int 或 float，但我仍然收到以下错误：

  File "C:\Users\spark\anaconda3\lib\site-packages\tensorflow\python\keras\layers\core.py", line 1002, in build
    'dtype %s' % (dtype,))

TypeError: Unable to build `Dense` layer with non-floating point dtype <dtype: 'string'>

大多数特征是虚拟的或由标准缩放器缩放它们是浮点数。并且为了确保我检查了最后一列和第四列的数据类型（它们是 Bold 中的整数），它们也是整数。所以..

为什么会出现这个错误？我该如何解决这个问题。

下面是我正在使用的代码：

X = dataset.iloc[:, 1:].values
y = dataset.iloc[:, 0].values
pred_set = prediction_set.values
temp_dataset = np.concatenate([X, pred_set], axis=0)

'''Encoding Features'''
index_list = [1,2,4,6,7]
reverse_index = []
for i in range(len(index_list)):
    reverse_index.append(index_list[i]-temp_dataset.shape[1])
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
for i in range(len(reverse_index)):
    index = reverse_index[i]
    ct = ColumnTransformer(transformers = [('encoder', OneHotEncoder(), [index])], remainder = 'passthrough')
    temp_dataset = ct.fit_transform(temp_dataset)

X = temp_dataset[:891]
pred_set = temp_dataset[891:]

'''Train - Test split'''
from sklearn.model_selection import train_test_split
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size = 0.2, random_state = 0)

'''Feature scaling'''
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
Xtrain[:, (-2,-3)] = sc.fit_transform(Xtrain[:, (-2,-3)])
Xtest[:, (-2,-3)] = sc.transform(Xtest[:, (-2,-3)])
pred_set[:, (-2,-3)] = sc.transform(pred_set[:, (-2,-3)])

Xtrain.astype(float)
Xtest.astype(float)
pred_set.astype(float)

import tensorflow as tf
classifier = tf.keras.models.Sequential()
classifier.add(tf.keras.layers.Dense(units= 20, activation='relu'))
classifier.add(tf.keras.layers.Dense(units= 20, activation='relu'))
classifier.add(tf.keras.layers.Dense(units= 1, activation='sigmoid'))
classifier.compile(optimizer='adam', loss = 'binary_crossentropy', metrics=['accuracy'])
classifier.fit(Xtrain, ytrain)

【问题讨论】：

不，先生！这个问题有图像数据，我有数字数据。我实际上试图将 int 列转换为浮点数：df[-1].astype(float) 但它显示错误 int obj has no attribute astype
@AustinSpark 只是一个旁注，为什么要在拆分后进行缩放？由于训练/测试拆分可能存在差异，这可能会导致一些问题，缩放器最终可能会导致数据集的两个部分的比例略有不同，您应该在拆分之前执行此操作。对于您可能对数据集的两个部分重复的任何其他操作相同，请在拆分之前执行此操作。如果不出意外，您可以节省时间和代码行:)
我在机器学习课程中了解到，如果不这样做，则应该在拆分数据 bcos 后进行缩放，否则可能会出现所谓的信息泄漏。你知道这很有意义..如果你与 Xtest 一起缩放，缩放器是不同的。而如果您先缩放 train 然后应用 transform method 则不考虑 Xtest 它只会根据 Xtrain 的缩放器进行缩放。
哦，我没有注意到它是 fit_transform 和 transform，我的错，绝对有道理，对原始问题有任何更新吗？

标签： python machine-learning scikit-learn data-science

【解决方案1】：

尝试将您的数据转换为浮动，适用于 pandas 或 numpy：

df.astype(float)

根据评论，不要尝试只转换一列，只需转换整个数据集。

如果有一些字符不是数字，下面的代码应该会有所帮助：

df[-1] = pd.to_numeric(df[-1], errors='coerce')

【讨论】：

我试过了..似乎没有效果..错误是一样的，在df.astype(float)之后，整数仍然是整数，它们没有变成浮点数。如果可能有帮助，我将编辑问题并尝试发布我的代码的一部分。
@AustinSpark 你也试过这个：df[-1] = pd.to_numeric(df[-1], errors='coerce') 这可能会将非数字字符设置为 NaN
刚试了下，还是不行，报同样的错误。我猜 bcos 数据中没有非数值。我尝试了 SVM 模型，它工作得很好。但我想使用 ANN，这就是这个问题。
我可以再尝试一次，而不是df[-1] 尝试df.iloc[:,-1] 以及x_train.head 可能有助于确定问题所在