【发布时间】:2017-03-15 18:29:38
【问题描述】:
我们有长期运行的 EMR 集群,用于提交 Spark 作业。我看到随着时间的推移,HDFS 会填满 Spark 应用程序日志,这有时会导致 EMR/Yarn (?) 看到的主机不健康。
运行 hadoop fs -R -h / 显示 [1],它清楚地表明从未删除任何应用程序日志。
我们已将 spark.history.fs.cleaner.enabled 设置为 true(在 Spark UI 中对此进行了验证),并希望其他默认值,如更清洁间隔(1 天)和更清洁最大使用时间(7 天),如下所述:http://spark.apache.org/docs/latest/monitoring.html#spark-configuration-options注意清理这些日志。但事实并非如此。
有什么想法吗?
[1]
-rwxrwx--- 2 hadoop spark 543.1 M 2017-01-11 13:13 /var/log/spark/apps/application_1484079613665_0001
-rwxrwx--- 2 hadoop spark 7.8 G 2017-01-17 10:51 /var/log/spark/apps/application_1484079613665_0002.inprogress
-rwxrwx--- 2 hadoop spark 1.4 G 2017-01-18 08:11 /var/log/spark/apps/application_1484079613665_0003
-rwxrwx--- 2 hadoop spark 2.9 G 2017-01-20 07:41 /var/log/spark/apps/application_1484079613665_0004
-rwxrwx--- 2 hadoop spark 125.9 M 2017-01-20 09:57 /var/log/spark/apps/application_1484079613665_0005
-rwxrwx--- 2 hadoop spark 4.4 G 2017-01-23 10:19 /var/log/spark/apps/application_1484079613665_0006
-rwxrwx--- 2 hadoop spark 6.6 M 2017-01-23 10:31 /var/log/spark/apps/application_1484079613665_0007
-rwxrwx--- 2 hadoop spark 26.4 M 2017-01-23 11:09 /var/log/spark/apps/application_1484079613665_0008
-rwxrwx--- 2 hadoop spark 37.4 M 2017-01-23 11:53 /var/log/spark/apps/application_1484079613665_0009
-rwxrwx--- 2 hadoop spark 111.9 M 2017-01-23 13:57 /var/log/spark/apps/application_1484079613665_0010
-rwxrwx--- 2 hadoop spark 1.3 G 2017-01-24 10:26 /var/log/spark/apps/application_1484079613665_0011
-rwxrwx--- 2 hadoop spark 7.0 M 2017-01-24 10:37 /var/log/spark/apps/application_1484079613665_0012
-rwxrwx--- 2 hadoop spark 50.7 M 2017-01-24 11:40 /var/log/spark/apps/application_1484079613665_0013
-rwxrwx--- 2 hadoop spark 96.2 M 2017-01-24 13:27 /var/log/spark/apps/application_1484079613665_0014
-rwxrwx--- 2 hadoop spark 293.7 M 2017-01-24 17:58 /var/log/spark/apps/application_1484079613665_0015
-rwxrwx--- 2 hadoop spark 7.6 G 2017-01-30 07:01 /var/log/spark/apps/application_1484079613665_0016
-rwxrwx--- 2 hadoop spark 1.3 G 2017-01-31 02:59 /var/log/spark/apps/application_1484079613665_0017
-rwxrwx--- 2 hadoop spark 2.1 G 2017-02-01 12:04 /var/log/spark/apps/application_1484079613665_0018
-rwxrwx--- 2 hadoop spark 2.8 G 2017-02-03 08:32 /var/log/spark/apps/application_1484079613665_0019
-rwxrwx--- 2 hadoop spark 5.4 G 2017-02-07 02:03 /var/log/spark/apps/application_1484079613665_0020
-rwxrwx--- 2 hadoop spark 9.3 G 2017-02-13 03:58 /var/log/spark/apps/application_1484079613665_0021
-rwxrwx--- 2 hadoop spark 2.0 G 2017-02-14 11:13 /var/log/spark/apps/application_1484079613665_0022
-rwxrwx--- 2 hadoop spark 1.1 G 2017-02-15 03:49 /var/log/spark/apps/application_1484079613665_0023
-rwxrwx--- 2 hadoop spark 8.8 G 2017-02-21 05:42 /var/log/spark/apps/application_1484079613665_0024
-rwxrwx--- 2 hadoop spark 371.2 M 2017-02-21 11:54 /var/log/spark/apps/application_1484079613665_0025
-rwxrwx--- 2 hadoop spark 1.4 G 2017-02-22 09:17 /var/log/spark/apps/application_1484079613665_0026
-rwxrwx--- 2 hadoop spark 3.2 G 2017-02-24 12:36 /var/log/spark/apps/application_1484079613665_0027
-rwxrwx--- 2 hadoop spark 9.5 M 2017-02-24 12:48 /var/log/spark/apps/application_1484079613665_0028
-rwxrwx--- 2 hadoop spark 20.5 G 2017-03-10 04:00 /var/log/spark/apps/application_1484079613665_0029
-rwxrwx--- 2 hadoop spark 7.3 G 2017-03-10 04:04 /var/log/spark/apps/application_1484079613665_0030.inprogress
【问题讨论】:
-
您使用的是哪个 EMR AMI 版本?那些容器/执行程序日志?你在使用 YARN 模式吗?
-
@swaranga-sarma 你能解决这个问题吗?我们遇到了类似的情况,我们的 1 个长期运行的应用程序从未清理过其日志。
-
@Interfector 我认为 ferris-tseng 是正确的。打算试试看。遇到类似问题
-
@GauravShah 我们已经尝试过这个解决方案,但它似乎没有奏效。原因是我们的应用程序运行时间很长。为了进行清理,应用程序需要完成,它不会为正在运行的应用程序轮换日志。我们不得不完全禁用 Spark History Server。
-
@Interfector 我想我们会遇到同样的问题。看看能不能找到别的东西
标签: apache-spark