如何在 TensorFlow 中读取 csv？答案

【问题标题】：How to read csv in TensorFlow?如何在 TensorFlow 中读取 csv？
【发布时间】：2019-03-22 20:42:32
【问题描述】：

我刚开始使用 TensorFlow。我正在尝试在 TensorFlow 中读取 csv 文件。这是我在网上找到的一个例子：

filename_queue = tf.train.string_input_producer(["d:/Feng/LP/tensorflowtrydata.csv"])
reader = tf.TextLineReader()
key, value = reader.read(filename_queue)
record_defaults = [[1.0], [1.0], [1.0], [1.0], ["Null"]] 
col1, col2, col3, col4, col5 = tf.decode_csv(value,record_defaults=record_defaults) 
features = tf.stack([col1, col2, col3, col4])
with tf.Session() as sess:
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)
    for i in range(200):
        example, label = sess.run([features, col5])
        print (example,col5)
    coord.request_stop()
    coord.join(threads)

但我有错误：

InvalidArgumentError (see above for traceback): Field 0 in record 0 is not a valid float: Sepal.Length
 [[Node: DecodeCSV_5 = DecodeCSV[OUT_TYPE=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_STRING], field_delim=",", na_value="", use_quote_delim=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ReaderReadV2_3:1, DecodeCSV_5/record_defaults_0, DecodeCSV_5/record_defaults_0, DecodeCSV_5/record_defaults_0, DecodeCSV_5/record_defaults_0, DecodeCSV_5/record_defaults_4)]]

数据是鸢尾花数据集。它看起来像：

iris.head()
   Sepal.Length  Sepal.Width  Petal.Length  Petal.Width Species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa
iris.dtypes
Sepal.Length    float64
Sepal.Width     float64
Petal.Length    float64
Petal.Width     float64
Species          object

你可以看到错误信息说它不是一个有效的浮点数。但是所有的数据都是float64的。

我什至不知道从哪里开始。

【问题讨论】：

标签： python csv tensorflow input

【解决方案1】：

令人困惑的是iris.head() 提供了前 5 行 df AFTER 标头已被处理。而在 tensorflow 中，您可以在错误行中看到： Field 0 in record 0 is not a valid float: Sepal.Length，标头不会自动处理，因此 Sepal.Length 字符串会导致问题。

您可以使用pandas.read_csv 首先导入文件，然后转换为 tensorflow 想要的任何内容，或者使用以下选项：

reader = tf.TextLineReader(skip_header_line=1)

【讨论】：