TensorFlow CSV 导入：将特征和标签添加到 TensorBoard 的摘要中读取双倍的行答案

【问题标题】：TensorFlow CSV import: adding features and labels to Summary for TensorBoard reads double the linesTensorFlow CSV 导入：将特征和标签添加到 TensorBoard 的摘要中读取双倍的行
【发布时间】：2017-02-07 08:31:16
【问题描述】：

我有一个非常基本的 TensorFlow 应用程序来测试从 CSV 逐行加载数据，并将各种摘要和可视化添加到 TensorBoard。我的输入 CSV 文件有 18 行和一堆列——前 XX 列是“特征”，随后的 YY 列是代表标签的 0 和 1。

我注意到，当我为包含特征和标签的变量创建摘要时，TensorFlow 从 CSV 中读取的行数是原来的两倍，因此在给定我的 18 行的情况下，我只循环 9 次，而不是循环 18 次。只要我删除将特征和标签添加到摘要的代码，读数恢复正常。将其他变量添加到摘要中，例如成本/等。那些是函数的输出，而不是 CSV，没有这个问题。

我还不知道我是否真的需要 TensorBoard 中的这些信息，所以我可以不用它，但我更愿意先将尽可能多的信息放入 TensorBoard，然后再决定我需要什么保持。

这是预期的行为还是我做错了什么？

    fileName = 'inputFile.csv'
    logs_path = 'log_path'
    try_epochs = 1
    sampling_size = 3
    TS = 479
    TL = 6

    rDefaults = [[0.02] for row in range((TS+TL))]

    def read_from_csv(filename_queue):
        reader = tf.TextLineReader(skip_header_lines=False)
        _, csv_row = reader.read(filename_queue)
        data = tf.decode_csv(csv_row, record_defaults=rDefaults)
        features = tf.slice(data, [0], [TS])
        label = tf.slice(data, [TS], [TL])  
        return features, label

    def input_pipeline(batch_size, num_epochs=None):
        filename_queue = tf.train.string_input_producer([fileName], num_epochs=num_epochs, shuffle=False)  
        example, label = read_from_csv(filename_queue)
        example_batch, label_batch = tf.train.batch(
            [example, label], 
            batch_size=batch_size)
        return example_batch, label_batch

    x = tf.placeholder(tf.float32, [None, TS], name='pl_one')
    W = tf.Variable(tf.random_normal([TS, TL], stddev=1), name='weights')
    b = tf.Variable(tf.random_normal([TL], stddev=1), name='biaes')
    y = tf.matmul(x, W) + b
    y_ = tf.placeholder(tf.float32, [None, TL], name='pl_two')

    examples, labels = input_pipeline(sampling_size, try_epochs)

    # this one causes the issue
    with tf.name_scope('Features'):
        features = examples
    # this one also causes the issue
    with tf.name_scope('Labels'):
        labDisp = labels    
    with tf.name_scope('Model'):
        myModel = tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y)
    with tf.name_scope('Loss'):
        lossFn = tf.reduce_mean(myModel)
    with tf.name_scope('Optimizer'):
        train_step = tf.train.AdamOptimizer(.05).minimize(lossFn)

    a1 = tf.summary.histogram("Features", features)
    a2 = tf.summary.histogram("Labels", labDisp)
    a3 = tf.summary.histogram("Model", myModel)
    a4 = tf.summary.scalar("Loss", lossFn)

    merged_summary_op = tf.summary.merge([a1, a2, a3, a4])

    with tf.Session() as sess:
        gInit = tf.global_variables_initializer().run()
        lInit = tf.local_variables_initializer().run()

        summary_writer = tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())

        coord = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(coord=coord)

        try:
            while not coord.should_stop():
                example_batch, label_batch = sess.run([examples, labels])  
                act = tf.argmax(label_batch, 1)
                fit = tf.argmax(y, 1)
                _, pAct, pFit, l, summary = sess.run([train_step, act, fit, lossFn, merged_summary_op], feed_dict={x: example_batch, y_: label_batch})
                summary_writer.add_summary(summary, i)
                print(pAct)
                print(pFit)

        except tf.errors.OutOfRangeError:
            print('Finished')
        finally:
            coord.request_stop()
        coord.join(threads)

感谢您的意见！

【问题讨论】：

标签： python tensorflow

【解决方案1】：

问题是每个 session.run 调用都从队列中提取（第一个调用显式执行，第二个调用是因为摘要操作依赖于队列数据）。如果您在同一个 session.run 调用中拥有汇总和队列数据的实际使用，而不是使用 feed_dict 来提供先前提取的数据，则不会丢弃任何数据。比如：

examples, labels = input_pipeline(sampling_size, try_epochs)

x, y_ = examples, labels

W = tf.Variable(tf.random_normal([TS, TL], stddev=1), name='weights')
b = tf.Variable(tf.random_normal([TL], stddev=1), name='biaes')
y = tf.matmul(x, W) + b


# this one causes the issue
with tf.name_scope('Features'):
  features = examples
# this one also causes the issue
with tf.name_scope('Labels'):
  labDisp = labels
with tf.name_scope('Model'):
  myModel = tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y)
with tf.name_scope('Loss'):
  lossFn = tf.reduce_mean(myModel)
with tf.name_scope('Optimizer'):
  train_step = tf.train.AdamOptimizer(.05).minimize(lossFn)

a1 = tf.summary.histogram("Features", features)
a2 = tf.summary.histogram("Labels", labDisp)
a3 = tf.summary.histogram("Model", myModel)
a4 = tf.summary.scalar("Loss", lossFn)

merged_summary_op = tf.summary.merge([a1, a2, a3, a4])

with tf.Session() as sess:
  gInit = tf.global_variables_initializer().run()
  lInit = tf.local_variables_initializer().run()

  summary_writer = tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())

  coord = tf.train.Coordinator()
  threads = tf.train.start_queue_runners(coord=coord)

  try:
    while not coord.should_stop():
      act = tf.argmax(labels, 1)
      fit = tf.argmax(y, 1)
      _, pAct, pFit, l, summary = sess.run([train_step, act, fit, lossFn,
                                            merged_summary_op])
      summary_writer.add_summary(summary, i)
      print(pAct)
      print(pFit)

  except tf.errors.OutOfRangeError:
    print('Finished')
  finally:
    coord.request_stop()
  coord.join(threads)

【讨论】：

您的解释很有道理，但我不明白您实际上如何为占位符提供 TF 变量？如果我按照您的建议运行代码，我会收到错误“您必须使用 dtype float 为占位符张量 'pl_one_4' 提供一个值”，这是有道理的，因为 X 是占位符，所以我们需要提供它，对吗？但是当然，如果我提供它，那么回到读取 CSV 文件两次的第一方？
我只是摆脱占位符。该模型可以直接从队列中工作。