如何使用来自 hdfs 的 tensorflow 读/写文件？答案

【问题标题】：How can I use tensorflow read/write files from hdfs?如何使用来自 hdfs 的 tensorflow 读/写文件？
【发布时间】：2021-01-28 20:18:17
【问题描述】：

我想使用 tensorflow 从 hdfs 写入和读取文件。我使用 'pip install ten......' 来安装 tensorflow。当我从 hdfs 读取文件时，它确实有效，就停在那里，没有错误回复。我是否需要通过 ./configure 而不是 bazel build 安装 tensorflow？是否必须像这样安装才能支持 hdfs？

这是我将代码写入本地文件系统的文件：

with tf.Session(graph=graph,config=config) as sess:
    sess.run(init)
    summary_writer = tf.summary.FileWriter('./mnist_logs2/', graph_def=sess.graph_def)
    for i in range(2000000):
        batch=mnist.train.next_batch(10000)
        train_step.run(feed_dict={x:batch[0],y_:batch[1],keep_prob:0.8})

        if i%100==0:
            acc_test=sess.run(accuracy,feed_dict={x:mnist.test.images,y_:mnist.test.labels,keep_prob:1.0})
            print("step %d,test accuracy %g"%(i,acc_test))
            if acc_test>0.993:
                break

    saver_path=saver.save(sess,'/home/test/mnist/model.ckpt')

    print("test accuracy %g"%accuracy.eval(feed_dict={x:mnist.test.images,y_:mnist.test.labels,keep_prob:1.0}))

这是我将文件写入 hdfs 的代码，只需更改路径：

with tf.Session(graph=graph,config=config) as sess:
    sess.run(init)
    summary_writer = tf.summary.FileWriter('hdfs://user/mlp/zpc/mnist_logs2/', graph_def=sess.graph_def)
    for i in range(2000000):
        batch=mnist.train.next_batch(10000)
        train_step.run(feed_dict={x:batch[0],y_:batch[1],keep_prob:0.8})

        if i%100==0:
            acc_test=sess.run(accuracy,feed_dict={x:mnist.test.images,y_:mnist.test.labels,keep_prob:1.0})
            print("step %d,test accuracy %g"%(i,acc_test))
            if acc_test>0.993:
                break

    saver_path=saver.save(sess,'hdfs://user/mlp/zpc/mnist_logs2/model.ckpt')

    print("test accuracy %g"%accuracy.eval(feed_dict={x:mnist.test.images,y_:mnist.test.labels,keep_prob:1.0}))

当我运行写入 hdfs 的代码时，我运行如下：

CLASSPATH=$($HADOOP_HDFS_HOME/bin/hadoop classpath --glob) python mnist_linux.py

【问题讨论】：

您能否提供一个简约的示例代码，该代码确实从文件中读取，但通过单行更改它使用 hdfs 和文件？
我已经更新了问题的一些代码，请帮忙检查。但是我想知道如果我安装 tensorflow 使用 pip install tensorflow ，它是否支持 hdfs？
我没有看到尝试从 hdfs 读取并失败的代码。我错过了什么吗？
'summary_writer = tf.summary.FileWriter('hdfs://user/mlp/zpc/mnist_logs2/', graph_def=sess.graph_def)', and 'saver_path=saver.save(sess, 'hdfs://user/mlp/zpc/mnist_logs2/model.ckpt')'

标签： tensorflow deep-learning

【解决方案1】：

在 Tensorflow 2.x 中，您可以使用 model.save() 库来保存 hdf5 文件。

#Create a model
model = create_model()
#Train the model
model.fit(train_images, train_labels, epochs=10)

# Save the entire model to a HDF5 file.
# The '.h5' extension indicates that the model should be saved to HDF5.
model.save('my_model.h5')
#Restore entire model
new_model = tf.keras.models.load_model('my_model.h5')

【讨论】：