何时在 Tensorflow Estimator 中使用迭代器答案

【问题标题】：When to use an iterator in Tensorflow Estimator何时在 Tensorflow Estimator 中使用迭代器
【发布时间】：2018-10-31 03:30:54
【问题描述】：

在 Tensorflow 指南中，指南有两个不同的地方描述了 Iris Data 示例的输入函数。一个输入函数只返回数据集本身，而另一个返回带有迭代器的数据集。

来自预制的 Estimator 指南：https://www.tensorflow.org/guide/premade_estimators

def train_input_fn(features, labels, batch_size):
"""An input function for training"""
# Convert the inputs to a Dataset.
dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))

# Shuffle, repeat, and batch the examples.
return dataset.shuffle(1000).repeat().batch(batch_size)

来自自定义估算器指南：https://www.tensorflow.org/guide/custom_estimators

def train_input_fn(features, labels, batch_size):
"""An input function for training"""
# Convert the inputs to a Dataset.
dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))

# Shuffle, repeat, and batch the examples.
dataset = dataset.shuffle(1000).repeat().batch(batch_size)

# Return the read end of the pipeline.
return dataset.make_one_shot_iterator().get_next()

我很困惑哪一个是正确的，如果它们都用于不同的情况，什么时候使用迭代器返回数据集是正确的？

【问题讨论】：

标签： tensorflow tensorflow-datasets tensorflow-estimator

【解决方案1】：

如果您的输入函数返回tf.data.Dataset，则会在底层创建一个迭代器，并使用其get_next() 函数为模型提供输入。这有点隐藏在源代码中，见parse_input_fn_resulthere。

我相信这只是在最近的更新中实现的，因此较旧的教程仍然在其输入函数中明确返回 get_next()，因为它是当时唯一的选择。使用两者应该没有区别，但是您可以通过返回数据集而不是迭代器来节省一点代码。

【讨论】：

是的，谢谢。在我阅读您的回复之前，这也让我感到困惑。我认为文档中没有明确提到这个问题。发现的唯一倾斜参考是性能指南中的代码 sn-p。 tensorflow.org/guide/performance/datasets 。你是对的，看起来现在你可以在input_fn 的末尾返回数据集，估计器就会知道该怎么做。
如果与 DistributionStrategy 一起使用，输入函数应返回 tf.data.Dataset。 ValueError: dataset_fn() must return a tf.data.Dataset when using a DistributionStrategy.
另一个需要注意的是，数据集上的一些操作需要在输入函数中完成，如下所述：github.com/tensorflow/tensorflow/issues/8042 和 github.com/tensorflow/tensorflow/issues/4026