【问题标题】:Spark Metrics: how to access executor and worker data?Spark Metrics:如何访问 executor 和 worker 数据?
【发布时间】:2016-08-12 18:45:13
【问题描述】:

注意:我在 YARN 上使用 Spark

我一直在尝试在 Spark 中实现的 Metric System。我启用了 ConsoleSink 和 CsvSink,并为所有四个实例(驱动程序、主服务器、执行程序、工作程序)启用了 JvmSource。但是我只有驱动输出,在控制台和 csv 目标目录中没有 worker/executor/master 数据。

读完this question后,我想知道在提交工作时我是否必须向执行者发送一些东西。

我的提交命令: ./bin/spark-submit --class org.apache.spark.examples.SparkPi lib/spark-examples-1.5.0-hadoop2.6.0.jar 10

下面是我的 metric.properties 文件:

# Enable JmxSink for all instances by class name
*.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink

# Enable ConsoleSink for all instances by class name
*.sink.console.class=org.apache.spark.metrics.sink.ConsoleSink

# Polling period for ConsoleSink
*.sink.console.period=10

*.sink.console.unit=seconds

#######################################
# worker instance overlap polling period
worker.sink.console.period=5

worker.sink.console.unit=seconds
#######################################

# Master instance overlap polling period
master.sink.console.period=15

master.sink.console.unit=seconds

# Enable CsvSink for all instances
*.sink.csv.class=org.apache.spark.metrics.sink.CsvSink
#driver.sink.csv.class=org.apache.spark.metrics.sink.CsvSink

# Polling period for CsvSink
*.sink.csv.period=10

*.sink.csv.unit=seconds

# Polling directory for CsvSink
*.sink.csv.directory=/opt/spark-1.5.0-bin-hadoop2.6/csvSink/

# Worker instance overlap polling period
worker.sink.csv.period=10

worker.sink.csv.unit=second

# Enable Slf4jSink for all instances by class name
#*.sink.slf4j.class=org.apache.spark.metrics.sink.Slf4jSink

# Polling period for Slf4JSink
#*.sink.slf4j.period=1

#*.sink.slf4j.unit=minutes


# Enable jvm source for instance master, worker, driver and executor
master.source.jvm.class=org.apache.spark.metrics.source.JvmSource

worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource

driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource

executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource

这里是 Spark 创建的 csv 文件的列表。我期待为 Spark 执行器(也是 JVM)访问相同的数据。

app-20160812135008-0013.driver.BlockManager.disk.diskSpaceUsed_MB.csv
app-20160812135008-0013.driver.BlockManager.memory.maxMem_MB.csv
app-20160812135008-0013.driver.BlockManager.memory.memUsed_MB.csv
app-20160812135008-0013.driver.BlockManager.memory.remainingMem_MB.csv
app-20160812135008-0013.driver.jvm.heap.committed.csv
app-20160812135008-0013.driver.jvm.heap.init.csv
app-20160812135008-0013.driver.jvm.heap.max.csv
app-20160812135008-0013.driver.jvm.heap.usage.csv
app-20160812135008-0013.driver.jvm.heap.used.csv
app-20160812135008-0013.driver.jvm.non-heap.committed.csv
app-20160812135008-0013.driver.jvm.non-heap.init.csv
app-20160812135008-0013.driver.jvm.non-heap.max.csv
app-20160812135008-0013.driver.jvm.non-heap.usage.csv
app-20160812135008-0013.driver.jvm.non-heap.used.csv
app-20160812135008-0013.driver.jvm.pools.Code-Cache.committed.csv
app-20160812135008-0013.driver.jvm.pools.Code-Cache.init.csv
app-20160812135008-0013.driver.jvm.pools.Code-Cache.max.csv
app-20160812135008-0013.driver.jvm.pools.Code-Cache.usage.csv
app-20160812135008-0013.driver.jvm.pools.Code-Cache.used.csv
app-20160812135008-0013.driver.jvm.pools.Compressed-Class-Space.committed.csv
app-20160812135008-0013.driver.jvm.pools.Compressed-Class-Space.init.csv
app-20160812135008-0013.driver.jvm.pools.Compressed-Class-Space.max.csv
app-20160812135008-0013.driver.jvm.pools.Compressed-Class-Space.usage.csv
app-20160812135008-0013.driver.jvm.pools.Compressed-Class-Space.used.csv
app-20160812135008-0013.driver.jvm.pools.Metaspace.committed.csv
app-20160812135008-0013.driver.jvm.pools.Metaspace.init.csv
app-20160812135008-0013.driver.jvm.pools.Metaspace.max.csv
app-20160812135008-0013.driver.jvm.pools.Metaspace.usage.csv
app-20160812135008-0013.driver.jvm.pools.Metaspace.used.csv
app-20160812135008-0013.driver.jvm.pools.PS-Eden-Space.committed.csv
app-20160812135008-0013.driver.jvm.pools.PS-Eden-Space.init.csv
app-20160812135008-0013.driver.jvm.pools.PS-Eden-Space.max.csv
app-20160812135008-0013.driver.jvm.pools.PS-Eden-Space.usage.csv
app-20160812135008-0013.driver.jvm.pools.PS-Eden-Space.used.csv
app-20160812135008-0013.driver.jvm.pools.PS-Old-Gen.committed.csv
app-20160812135008-0013.driver.jvm.pools.PS-Old-Gen.init.csv
app-20160812135008-0013.driver.jvm.pools.PS-Old-Gen.max.csv
app-20160812135008-0013.driver.jvm.pools.PS-Old-Gen.usage.csv
app-20160812135008-0013.driver.jvm.pools.PS-Old-Gen.used.csv
app-20160812135008-0013.driver.jvm.pools.PS-Survivor-Space.committed.csv
app-20160812135008-0013.driver.jvm.pools.PS-Survivor-Space.init.csv
app-20160812135008-0013.driver.jvm.pools.PS-Survivor-Space.max.csv
app-20160812135008-0013.driver.jvm.pools.PS-Survivor-Space.usage.csv
app-20160812135008-0013.driver.jvm.pools.PS-Survivor-Space.used.csv
app-20160812135008-0013.driver.jvm.PS-MarkSweep.count.csv
app-20160812135008-0013.driver.jvm.PS-MarkSweep.time.csv
app-20160812135008-0013.driver.jvm.PS-Scavenge.count.csv
app-20160812135008-0013.driver.jvm.PS-Scavenge.time.csv
app-20160812135008-0013.driver.jvm.total.committed.csv
app-20160812135008-0013.driver.jvm.total.init.csv
app-20160812135008-0013.driver.jvm.total.max.csv
app-20160812135008-0013.driver.jvm.total.used.csv
DAGScheduler.job.activeJobs.csv
DAGScheduler.job.allJobs.csv
DAGScheduler.messageProcessingTime.csv
DAGScheduler.stage.failedStages.csv
DAGScheduler.stage.runningStages.csv
DAGScheduler.stage.waitingStages.csv

【问题讨论】:

  • 我遇到了这个问题,您找到解决方案了吗?

标签: apache-spark monitoring hadoop-yarn metrics


【解决方案1】:

由于您没有给出您尝试过的命令,我假设您没有传递 metrics.properties。要传递 metrics.propertis 命令应该是

spark-submit <other parameters> --files metrics.properties 
--conf spark.metrics.conf=metrics.properties

注意metrics.properties 必须在--files & --conf 中指定,--files 会将metrics.properties 文件传输到执行器。由于您可以在驱动程序而不是执行程序上看到输出,因此我认为您缺少 --files 选项。

【讨论】:

  • 有趣的是,如果我在执行程序上没有 metric.properties 文件,即使使用 --files 开关也不起作用。我认为这与人们向执行者发送一个“大罐子”的“错误”有关,其中一次包含所有必需的文件。所以现在它适用于执行者,当我每个人都有metric.properties 文件时。我什至不必指定任何开关,即./bin/spark-submit --class org.apache.spark.examples.SparkPi lib/spark-examples-1.5.0-hadoop2.6.0.jar 10 可以正常工作。对 master 和 worker 指标有任何想法吗?
猜你喜欢
  • 2018-08-05
  • 1970-01-01
  • 1970-01-01
  • 2015-02-11
  • 1970-01-01
  • 2017-06-14
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多