TypeError：预期的二进制或 unicode 字符串，得到 618.0答案

【问题标题】：TypeError: Expected binary or unicode string, got 618.0TypeError：预期的二进制或 unicode 字符串，得到 618.0
【发布时间】：2021-04-04 19:18:52
【问题描述】：

我一直在尝试将这个 ML 线性模型实现到我的数据集中。（https://www.tensorflow.org/tutorials/estimator/linear）
语言：Python 3.8.3
图书馆： TensorFlow 2.4.0
Numpy：1.19.3
熊猫
Matplotlib
和其他：

import os
import sys

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import clear_output
from six.moves import urllib

import tensorflow.compat.v2.feature_column as fc
import tensorflow as tf

ss1517 是我的数据集的名称。它是一个 CSV 文件，有 4116 行和 20 列，并且有很多 NaN 值（没有没有 NaN 值的列）

traindata = ss1517.iloc[0:2470,:] # 60 % of my dataset is splitted by training set
evaldata = ss1517.iloc[2470:4116, :] # 40 % of my dataset is splitted by eval set
ytrain = traindata.pop("AvgOfMajor N")
yeval = evaldata.pop("AvgOfMajor N")

CATEGORICAL_COLUMNS 是我的数据集中的分类列。
NUMERIC_COLUMNS 是我的数据集中的数字列。

CATEGORICAL_COLUMNS = ['Location Name', 'Location Code', 'Borough', 'Register', 'Building Name', 'Schools in Building', 'ENGroupA', 'RangeA']
NUMERIC_COLUMNS = ['Geographical District Code', '# Schools', 'Major N', 'Oth N', 'NoCrim N', 'Prop N', 'Vio N', 'AvgOfOth N', 'AvgOfNoCrim N', 'AvgOfProp N', 'AvgOfVio N']

feature_columns = []#Sadece linear regression'u eğitmek için kullanıyoruz
for feature_name in CATEGORICAL_COLUMNS:
  vocabulary = traindata[feature_name].unique()
  feature_columns.append(tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary))
for feature_name in NUMERIC_COLUMNS:
  feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype=tf.float32))

def make_input_fn(data_df, label_df, num_epochs=10, shuffle=True, batch_size=32):
  def input_function():# inner function, this will be returned.
    ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df)) # Create tf.data.Dataset object with data and its label
    if shuffle:
      ds = ds.shuffle(1000) # randomize order of data
    ds = ds.batch(batch_size).repeat(num_epochs)
    return ds # return a batch of dataset
  return input_function # return the input_function

train_input_fn = make_input_fn(traindata, ytrain) 
eval_input_fn = make_input_fn(evaldata, yeval, num_epochs=1, shuffle=False)

linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns)
linear_est.train(train_input_fn) #train
result = linear_est.evaluate(eval_input_fn) #get model metrics/stats by testing on testing data

clear_output() #clears console output
print(result["accuracy"]) #the result variable is simply dict of stats about our model

每次我尝试用df.fillna(method="ffill")、 df.fillna(method="bfill")、df.fillna(value = 0) 或df.fillna(value="randomstringvalues) 填充 NaN 值时，都会出现此错误（TypeError: Expected binary or unicode string, got 618.0）。我还尝试使用 df.dropna() 删除 NaN 值
不用说，当我尝试使用 NaN 值运行我的代码时，它无法工作。
我有两个问题。
第一个，我如何处理我的 NaN 值，这样我以后就不会看到这个错误（TypeError: Expected binary or unicode string, got 618.0）？
第二个，我怎样才能摆脱这个错误并迅速将我的数据集实施到这个模型中？
P.S.：我很肯定我没有打错字。

【问题讨论】：

标签： python tensorflow typeerror

【解决方案1】：

我的猜测是您的数据中有一些非 Unicode 字符。非 unicode 字符是这样的： ä

任何不是字母、数字或符号的东西。你有两个选择，找到所有这些字符和replace他们用别的东西或remove他们。

或者您可以在读取 csv 文件时使用正确的编码。 pandas.read_csv

data = pandas.read_csv(myfile, encoding='utf-8', quotechar='"', delimiter=',')

【讨论】：

【解决方案2】：

我看不到您的数据，所以这是一个猜测。打开您的 .csv 文件并搜索 618.0。也许，某些行没有所有预期值，并且解析器正在尝试加载预期分类值的数值。另一种查看您是否有“格式”问题的方法是在 excel 中打开 csv 并查看所有行的格式是否正确。

【讨论】：

我的 CSV 文件是这两者的组合：data.cityofnewyork.us/api/views/rear-wh5i/…，data.cityofnewyork.us/api/views/44t3-dj6x/…。我将它们组合成这样：ss1517 = pd.concat([ss1516,ss1617],axis=0,join="inner") 我搜索并找不到任何关于 618.0 的信息如您所见，行对于 CSV 文件来说已经足够了（不像 XLSX 文件那样严格格式化，而是用 coma 格式化）