【发布时间】:2018-03-14 09:39:33
【问题描述】:
如何修复我的 GC overhead limit exceeded 在 PySpark 版本 2.2.1 中发生的问题。安装在 Ubuntu 16.04.4 上。
在 Python 3.5.2 脚本中,我将 spark 设置为:
spark = SparkSession.builder.appName('achats_fusion_files').getOrCreate()
spark.conf.set("spark.sql.pivotMaxValues", "1000000")
spark.conf.set("spark.sql.autoBroadcastJoinThreshold", "-1")
spark.conf.set("spark.executor.memory", "1g")
spark.conf.set("spark.driver.memory", "1g")
如何使用 Python 脚本中的良好设置来解决问题?
下面的错误信息:
18/03/14 09:57:25 ERROR Executor: Exception in task 34.0 in stage 36.0 (TID 2076)
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.regex.Pattern.compile(Pattern.java:1667)
at java.util.regex.Pattern.<init>(Pattern.java:1351)
at java.util.regex.Pattern.compile(Pattern.java:1028)
at org.apache.spark.network.util.JavaUtils.byteStringAs(JavaUtils.java:266)
at org.apache.spark.network.util.JavaUtils.byteStringAsBytes(JavaUtils.java:302)
at org.apache.spark.util.Utils$.byteStringAsBytes(Utils.scala:1087)
at org.apache.spark.SparkConf.getSizeAsBytes(SparkConf.scala:310)
at org.apache.spark.io.LZ4CompressionCodec.compressedOutputStream(CompressionCodec.scala:114)
at org.apache.spark.serializer.SerializerManager.wrapForCompression(SerializerManager.scala:156)
at org.apache.spark.serializer.SerializerManager.wrapStream(SerializerManager.scala:131)
at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:120)
at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:237)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
【问题讨论】:
标签: pyspark