Apache Spark：在 PairFlatMapFunction 中，如何将元组添加回 Iterable<Tuple2<Integer, String>> 返回类型答案

【问题标题】：Apache Spark: In PairFlatMapFunction, how to add tuples back to the Iterable<Tuple2<Integer, String>> return typeApache Spark：在 PairFlatMapFunction 中，如何将元组添加回 Iterable<Tuple2<Integer, String>> 返回类型
【发布时间】：2017-04-26 20:20:19
【问题描述】：

我是新来的火花。我一直在研究涉及两个数据集的代码。因此，我从 PairFlatMapFunction 开始，在其中我正在处理映射器。

JavaPairRDD<Integer, String> trainingArray = trainingData.flatMapToPair(new PairFlatMapFunction<String, Integer, String>(){
        public Iterable<Tuple2<Integer, String>> call(String s) {
//code to form the tuples of type Tuple2<Integer, String>
// new Tuples2<Integer, String> 
}

如何将元组添加回由 reducer (reduceByKey) 处理的可迭代类。

任何指针将不胜感激。

【问题讨论】：

标签： java hadoop apache-spark rdd bigdata

【解决方案1】：

谢谢！！

我已经找到了这个问题的答案。

我们需要定义下面的ArrayList

List<Tuple2<Integer, String>> result = new ArrayList<Tuple2<Integer, String>>();

如下添加元组

result.add(new Tuple2<Integer, String>(keyValue, concat));

并返回结果。

【讨论】：

【解决方案2】：

如果您的结果仅包含一个元组，这可能是一个更好的选择。

return Collections.singletonList(new Tuple2<Integer, String>(keyValue, concat)).iterator();

【讨论】：