dr-elephant 是大象医生,帮助分析hadoop任务,spark任务运行状况,给出调优建议。
1. git clone https://github.com/linkedin/dr-elephant.git cd dr-elephant*
java1.8
2. 下载 https://downloads.typesafe.com/typesafe-activator/1.3.12/typesafe-activator-1.3.12.zip
配置环境
export ACTIVATOR_HOME=/path/to/unzipped/activator export PATH=$ACTIVATOR_HOME/bin:$PATH
3.sudo yum install npm sudo npm install -g bower cd web; bower install; cd ..
4.编译
./compile.sh [./compile.conf] -- compile.com中可以指定hadoop,spark版本,测试指定其它版本都编译不通过,默认就行
5. 编译完后会生成zip文件
ls dist dr-elephant*.zip
安装
1. 解压zip
unzip dr-elephant-2.1.7.zip
2.配置环境变量 vi .bash_profile
export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export HADOOP_CONF_DIR=/etc/hadoop/conf
export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
export SPARK_CONF_DIR=/etc/spark/conf
export PATH=$HADOOP_HOME/bin:$PATH
3.配置dr-elephant
cd dr-elephant-2.1.7/app-conf
vi elephant.conf --配置数据库连接,库要先建
db_url=
db_name=drelephant2
db_user=
db_password=
# Enable web analytics for the application.
# By default analytics is not turned on. Set this property
# to true and paste the javascript snippet into 'public/analytics/track.js' for
# enabling web analytics for the application. You may configure an analytics application
# like piwik. More information on piwik at piwik.org
enable_analytics=false
# Set the keytab user and the path to the keytab file if security is enabled.
# keytab_user=""
# keytab_location=""
# Additional Configuration
# Check https://www.playframework.com/documentation/2.2.x/ProductionConfiguration
# Adding the below line for Heap Tuning and Java OPTS
# Use mem for tuning Heap Memory
jvm_args="-Devolutionplugin=enabled -DapplyEvolutions.default=true -mem 1024 -J-Xloggc:$project_root../logs/elephant/dr-gc.`date +'%Y%m%d%H%M'` -J-XX:+PrintGCDetails"
--- 这段第一次初始化数据库表需要后面启禁止 disable -Devolutionplugin=enabled -DapplyEvolutions.default=true
去掉Tez
vi FetcherConf.xml
注释掉Tez
注释掉默认的spark,我们是1.5以上的版本,放开下面的spark,spark2使用最下面的带参数的
iv SchedulerConf.xml
可配置oozie,airflow等调度器的任务监控
启动
./bin/start.sh
日志默认三个地方 dr-elephant-2.1.7/dr.log ,dr-elephant-2.1.7/logs/application.log
dr-elephant-2.1.7同级目录 logs/elephant/dr_elephant.log
启动用户需要有hadoop,spark 历史日志文件权限,将启动用户加入spark,hadoop组即可