【发布时间】:2016-08-17 18:40:14
【问题描述】:
我尝试保存一个大约为 1 的大文本文件。 5GB
sc.parallelize(cfile.toString()
.split("\n"), 1)
.saveAsTextFile(new Path(path+".cs", "data").toUri.toString)
但我不断得到
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
...
org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 6
at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:542)
at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:538)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
我已经被困在这里很久了。谁能在这里帮助我并解释如何将cfile 保存为文本文件?
独立/本地/Yarn 集群?
- 纱团
内存/核心设置?
- 1.8 TB
- 285 核
分区数?
- 我目前正在设置分区数为
1:
设置分区数的相关代码行:
val model = word2vec
.setMinCount(minCount.asInstanceOf[Int])
.setVectorSize(arguments.getVectorSize)
.setWindowSize(arguments.getContextWindowSize)
.setNumPartitions(numW2vPartitions)
.setLearningRate(learningRate)
.setNumIterations(arguments.getNumIterations)
.fit(wordSequence)
spark-submit 参数:
spark-submit --master yarn
--deploy-mode cluster
--driver-memory 20G
--num-executors 5
--executor-cores 8
--driver-java-options "-Dspark.akka.frameSize=2000"
--executor-memory 20G --class
【问题讨论】:
标签: apache-spark word2vec