【发布时间】:2019-01-29 21:27:11
【问题描述】:
在 AWS Glue 中运行 python 作业时出现错误:
原因:容器因超出内存限制而被 YARN 杀死。使用了 5.6 GB 的 5.5 GB 物理内存。考虑提升 spark.yarn.executor.memoryOverhead
在脚本开头运行时:
print '--- Before Conf --'
print 'spark.yarn.driver.memory', sc._conf.get('spark.yarn.driver.memory')
print 'spark.yarn.driver.cores', sc._conf.get('spark.yarn.driver.cores')
print 'spark.yarn.executor.memory', sc._conf.get('spark.yarn.executor.memory')
print 'spark.yarn.executor.cores', sc._conf.get('spark.yarn.executor.cores')
print "spark.yarn.executor.memoryOverhead", sc._conf.get("spark.yarn.executor.memoryOverhead")
print '--- Conf --'
sc._conf.setAll([('spark.yarn.executor.memory', '15G'),('spark.yarn.executor.memoryOverhead', '10G'),('spark.yarn.driver.cores','5'),('spark.yarn.executor.cores', '5'), ('spark.yarn.cores.max', '5'), ('spark.yarn.driver.memory','15G')])
print '--- After Conf ---'
print 'spark.driver.memory', sc._conf.get('spark.driver.memory')
print 'spark.driver.cores', sc._conf.get('spark.driver.cores')
print 'spark.executor.memory', sc._conf.get('spark.executor.memory')
print 'spark.executor.cores', sc._conf.get('spark.executor.cores')
print "spark.executor.memoryOverhead", sc._conf.get("spark.executor.memoryOverhead")
我得到以下输出:
--- 会议前 --
spark.yarn.driver.memory 无
spark.yarn.driver.cores 无
spark.yarn.executor.memory 无
spark.yarn.executor.cores 无
spark.yarn.executor.memoryOverhead 无
--- 会议 --
--- 会议后---
spark.yarn.driver.memory 15G
spark.yarn.driver.cores 5
spark.yarn.executor.memory 15G
spark.yarn.executor.cores 5
spark.yarn.executor.memoryOverhead 10G
spark.yarn.executor.memoryOverhead 似乎已设置,但为什么无法识别?我仍然遇到同样的错误。
我看过其他关于设置 spark.yarn.executor.memoryOverhead 问题的帖子,但在它似乎已设置但无法正常工作时却没有?
【问题讨论】:
标签: apache-spark pyspark aws-glue