【问题标题】:TensorFlow CSV import: adding features and labels to Summary for TensorBoard reads double the linesTensorFlow CSV 导入:将特征和标签添加到 TensorBoard 的摘要中读取双倍的行
【发布时间】:2017-02-07 08:31:16
【问题描述】:

我有一个非常基本的 TensorFlow 应用程序来测试从 CSV 逐行加载数据,并将各种摘要和可视化添加到 TensorBoard。我的输入 CSV 文件有 18 行和一堆列——前 XX 列是“特征”,随后的 YY 列是代表标签的 0 和 1。

我注意到,当我为包含特征和标签的变量创建摘要时,TensorFlow 从 CSV 中读取的行数是原来的两倍,因此在给定我的 18 行的情况下,我只循环 9 次,而不是循环 18 次。只要我删除将特征和标签添加到摘要的代码,读数恢复正常。将其他变量添加到摘要中,例如成本/等。那些是函数的输出,而不是 CSV,没有这个问题。

我还不知道我是否真的需要 TensorBoard 中的这些信息,所以我可以不用它,但我更愿意先将尽可能多的信息放入 TensorBoard,然后再决定我需要什么保持。

这是预期的行为还是我做错了什么?

    fileName = 'inputFile.csv'
    logs_path = 'log_path'
    try_epochs = 1
    sampling_size = 3
    TS = 479
    TL = 6

    rDefaults = [[0.02] for row in range((TS+TL))]

    def read_from_csv(filename_queue):
        reader = tf.TextLineReader(skip_header_lines=False)
        _, csv_row = reader.read(filename_queue)
        data = tf.decode_csv(csv_row, record_defaults=rDefaults)
        features = tf.slice(data, [0], [TS])
        label = tf.slice(data, [TS], [TL])  
        return features, label

    def input_pipeline(batch_size, num_epochs=None):
        filename_queue = tf.train.string_input_producer([fileName], num_epochs=num_epochs, shuffle=False)  
        example, label = read_from_csv(filename_queue)
        example_batch, label_batch = tf.train.batch(
            [example, label], 
            batch_size=batch_size)
        return example_batch, label_batch

    x = tf.placeholder(tf.float32, [None, TS], name='pl_one')
    W = tf.Variable(tf.random_normal([TS, TL], stddev=1), name='weights')
    b = tf.Variable(tf.random_normal([TL], stddev=1), name='biaes')
    y = tf.matmul(x, W) + b
    y_ = tf.placeholder(tf.float32, [None, TL], name='pl_two')

    examples, labels = input_pipeline(sampling_size, try_epochs)

    # this one causes the issue
    with tf.name_scope('Features'):
        features = examples
    # this one also causes the issue
    with tf.name_scope('Labels'):
        labDisp = labels    
    with tf.name_scope('Model'):
        myModel = tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y)
    with tf.name_scope('Loss'):
        lossFn = tf.reduce_mean(myModel)
    with tf.name_scope('Optimizer'):
        train_step = tf.train.AdamOptimizer(.05).minimize(lossFn)

    a1 = tf.summary.histogram("Features", features)
    a2 = tf.summary.histogram("Labels", labDisp)
    a3 = tf.summary.histogram("Model", myModel)
    a4 = tf.summary.scalar("Loss", lossFn)

    merged_summary_op = tf.summary.merge([a1, a2, a3, a4])

    with tf.Session() as sess:
        gInit = tf.global_variables_initializer().run()
        lInit = tf.local_variables_initializer().run()

        summary_writer = tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())

        coord = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(coord=coord)

        try:
            while not coord.should_stop():
                example_batch, label_batch = sess.run([examples, labels])  
                act = tf.argmax(label_batch, 1)
                fit = tf.argmax(y, 1)
                _, pAct, pFit, l, summary = sess.run([train_step, act, fit, lossFn, merged_summary_op], feed_dict={x: example_batch, y_: label_batch})
                summary_writer.add_summary(summary, i)
                print(pAct)
                print(pFit)

        except tf.errors.OutOfRangeError:
            print('Finished')
        finally:
            coord.request_stop()
        coord.join(threads)

感谢您的意见!

【问题讨论】:

    标签: python tensorflow


    【解决方案1】:

    问题是每个 session.run 调用都从队列中提取(第一个调用显式执行,第二个调用是因为摘要操作依赖于队列数据)。如果您在同一个 session.run 调用中拥有汇总和队列数据的实际使用,而不是使用 feed_dict 来提供先前提取的数据,则不会丢弃任何数据。比如:

    examples, labels = input_pipeline(sampling_size, try_epochs)
    
    x, y_ = examples, labels
    
    W = tf.Variable(tf.random_normal([TS, TL], stddev=1), name='weights')
    b = tf.Variable(tf.random_normal([TL], stddev=1), name='biaes')
    y = tf.matmul(x, W) + b
    
    
    # this one causes the issue
    with tf.name_scope('Features'):
      features = examples
    # this one also causes the issue
    with tf.name_scope('Labels'):
      labDisp = labels
    with tf.name_scope('Model'):
      myModel = tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y)
    with tf.name_scope('Loss'):
      lossFn = tf.reduce_mean(myModel)
    with tf.name_scope('Optimizer'):
      train_step = tf.train.AdamOptimizer(.05).minimize(lossFn)
    
    a1 = tf.summary.histogram("Features", features)
    a2 = tf.summary.histogram("Labels", labDisp)
    a3 = tf.summary.histogram("Model", myModel)
    a4 = tf.summary.scalar("Loss", lossFn)
    
    merged_summary_op = tf.summary.merge([a1, a2, a3, a4])
    
    with tf.Session() as sess:
      gInit = tf.global_variables_initializer().run()
      lInit = tf.local_variables_initializer().run()
    
      summary_writer = tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())
    
      coord = tf.train.Coordinator()
      threads = tf.train.start_queue_runners(coord=coord)
    
      try:
        while not coord.should_stop():
          act = tf.argmax(labels, 1)
          fit = tf.argmax(y, 1)
          _, pAct, pFit, l, summary = sess.run([train_step, act, fit, lossFn,
                                                merged_summary_op])
          summary_writer.add_summary(summary, i)
          print(pAct)
          print(pFit)
    
      except tf.errors.OutOfRangeError:
        print('Finished')
      finally:
        coord.request_stop()
      coord.join(threads)
    

    【讨论】:

    • 您的解释很有道理,但我不明白您实际上如何为占位符提供 TF 变量?如果我按照您的建议运行代码,我会收到错误“您必须使用 dtype float 为占位符张量 'pl_one_4' 提供一个值”,这是有道理的,因为 X 是占位符,所以我们需要提供它,对吗?但是当然,如​​果我提供它,那么回到读取 CSV 文件两次的第一方?
    • 我只是摆脱占位符。该模型可以直接从队列中工作。
    猜你喜欢
    • 2018-12-07
    • 2018-02-26
    • 2017-11-09
    • 2017-02-24
    • 2017-11-05
    • 2017-11-26
    • 1970-01-01
    • 1970-01-01
    • 2017-04-11
    相关资源
    最近更新 更多