使用zabbix做聚合监控

zabbix做为越来越受大家欢迎的监控工具，其相对于nagios,cacti之流，最大的一个特点就是数据是存放在关系型数据库中的，这样就可以极大的方便后续的数据查询，处理等，比如我们想知道一台机器全天ioutil 超过80的时间比例，在zabbix的数据库中，一个sql就可以搞定了，而在cacti中就不这么方便了，而且也不用担心数据随着时间的边长而被稀释掉。

在做zabbix的数据分析时，用到的比较多的表一般有hosts,items，interface,hisory*,trend*相关表，比如，通过zabbix监控整个hadoop集群的mapred的使用情况，只需要把每台机器的lastvalue进行聚合就好了。。

可以简单通过下面这种方式：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

#!/usr/bin/python
#edit by ericni
#to get hadoop totaol statistics
# -*- coding: utf8 -*-

import MySQLdb

import sys

import os

def get_total_value(sql):

    db = MySQLdb.connect(host='xxx',user='xxxx',passwd='xxx',db='xxx')

    cursor = db.cursor()

    cursor.execute(sql)

    try:

        result = cursor.fetchone()[0]

    except:

        result = 0

    cursor.close()

    db.close()

    return result

if __name__ == '__main__':

    sql = ''

    if sys.argv[1] == "all_mapTaskSlots":

        sql = "select sum(lastvalue) from  hosts a, items b   where key_ = 'hadoop_metrics[mrmetrics.log,mapred.tasktracker,mapTaskSlots]' and lower(host) like '%-hadoop-datanode%'  and a.hostid = b.hostid"

    elif sys.argv[1] == "all_maps_running":

        sql = "select sum(lastvalue) from  hosts a, items b   where key_ = 'hadoop_metrics[mrmetrics.log,mapred.tasktracker,maps_running]' and lower(host) like '%-hadoop-datanode%'  and a.hostid = b.hostid"

    elif sys.argv[1] == "all_reduceTaskSlots":

        sql = "select sum(lastvalue) from  hosts a, items b   where key_ = 'hadoop_metrics[mrmetrics.log,mapred.tasktracker,reduceTaskSlots]' and lower(host) like '%-hadoop-datanode%'  and a.hostid = b.hostid"

    elif sys.argv[1] == "all_reduces_running":

        sql = "select sum(lastvalue) from  hosts a, items b   where key_ = 'hadoop_metrics[mrmetrics.log,mapred.tasktracker,reduces_running]' and lower(host) like '%-hadoop-datanode%'  and a.hostid = b.hostid"

    elif sys.argv[1] == "all_ThreadsBlocked":

        sql = "select sum(lastvalue) from  hosts a, items b   where key_ =  'hadoop_stats[datanode,ThreadsBlocked]' and lower(host) like '%-hadoop-datanode%'  and a.hostid = b.hostid"

    elif sys.argv[1] == "all_ThreadsRunnable":

        sql = "select sum(lastvalue) from  hosts a, items b   where key_ = 'hadoop_stats[datanode,ThreadsRunnable]' and lower(host) like '%-hadoop-datanode%'  and a.hostid = b.hostid"

    elif sys.argv[1] == "all_ThreadsWaiting":

        sql = "select sum(lastvalue) from  hosts a, items b   where key_ =  'hadoop_stats[datanode,ThreadsWaiting]' and lower(host) like '%-hadoop-datanode%'  and a.hostid = b.hostid"

    else:

        sys.exit(0)

    value = get_total_value(sql)

    print value

然后把可用的total map和total running map画在一个graph里面就可以知道map的使用率情况了。。

当然，zabbix也有自己的前端聚合的功能，不过相对来说，这样灵活性会高一些。。

本文转自菜菜光 51CTO博客，原文链接：http://blog.51cto.com/caiguangguang/1369808，如需转载请自行联系原作者