dr-elephant 是大象医生,帮助分析hadoop任务,spark任务运行状况,给出调优建议。

1. git clone https://github.com/linkedin/dr-elephant.git cd dr-elephant*

java1.8

2. 下载 https://downloads.typesafe.com/typesafe-activator/1.3.12/typesafe-activator-1.3.12.zip

配置环境

export ACTIVATOR_HOME=/path/to/unzipped/activator export PATH=$ACTIVATOR_HOME/bin:$PATH

3.sudo yum install npm sudo npm install -g bower cd web; bower install; cd ..

4.编译

./compile.sh [./compile.conf] -- compile.com中可以指定hadoop,spark版本,测试指定其它版本都编译不通过,默认就行

5. 编译完后会生成zip文件

ls dist dr-elephant*.zip

 

安装

1. 解压zip

unzip dr-elephant-2.1.7.zip

2.配置环境变量 vi .bash_profile

export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop

export HADOOP_CONF_DIR=/etc/hadoop/conf

export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark

export SPARK_CONF_DIR=/etc/spark/conf

export PATH=$HADOOP_HOME/bin:$PATH

 

3.配置dr-elephant

cd dr-elephant-2.1.7/app-conf

vi elephant.conf --配置数据库连接,库要先建

db_url=

db_name=drelephant2

db_user=

db_password=

 

# Enable web analytics for the application.

# By default analytics is not turned on. Set this property

# to true and paste the javascript snippet into 'public/analytics/track.js' for

# enabling web analytics for the application. You may configure an analytics application

# like piwik. More information on piwik at piwik.org

enable_analytics=false

 

# Set the keytab user and the path to the keytab file if security is enabled.

# keytab_user=""

# keytab_location=""

 

# Additional Configuration

# Check https://www.playframework.com/documentation/2.2.x/ProductionConfiguration

# Adding the below line for Heap Tuning and Java OPTS

# Use mem for tuning Heap Memory

jvm_args="-Devolutionplugin=enabled -DapplyEvolutions.default=true -mem 1024 -J-Xloggc:$project_root../logs/elephant/dr-gc.`date +'%Y%m%d%H%M'` -J-XX:+PrintGCDetails"

--- 这段第一次初始化数据库表需要后面启禁止 disable  -Devolutionplugin=enabled -DapplyEvolutions.default=true

去掉Tez

vi FetcherConf.xml

注释掉Tez

dr-elephant 编译安装文档

注释掉默认的spark,我们是1.5以上的版本,放开下面的spark,spark2使用最下面的带参数的

dr-elephant 编译安装文档

iv SchedulerConf.xml

可配置oozie,airflow等调度器的任务监控

启动

./bin/start.sh

日志默认三个地方 dr-elephant-2.1.7/dr.log ,dr-elephant-2.1.7/logs/application.log

dr-elephant-2.1.7同级目录 logs/elephant/dr_elephant.log

启动用户需要有hadoop,spark 历史日志文件权限,将启动用户加入spark,hadoop组即可

相关文章: