Flume学习:
Flume简介: Flume是Cloudera提供的一个高可靠、高可用、分布式的海量日志收集、聚合和传输的系统。
Flume工作原理:Flume的数据流由事件Event贯穿始终,事件是flume的基本单位,它携带数据(字节数组的形式)并且携带头信息,这些Event由Agent外部的source生成,当Source捕获事件后会进行特定的格式化,然后Source会把事件推入单个或者多个Channel中。可以把Channel看成一个缓冲区,它将保存事件直到Sink处理完该事件。Sink负责持久化日志或者把事件推向另一个Source。
Flume的核心概念:
Events:一个数据单元,带有一个可选的消息头,可以是日志记录、avro对象等。
Agent:JVM中一个独立的进程,包含组件Source、Channel、Sink。
Client:运行于一个独立的线程,用于生产数据并将其发送给Agent。
Source:用来消费传递到该组件的Event,从Client收集数据,传递给Channel。
Channel:中转Event的一个临时存储,保存从Source组件传递过来的event,其实就是 连接Source和Sink。
Sink:从Channel收集数据,运行在一个独立的线程。
Flume以Agent作为最小的独立运行的单位,一个Agent就是一个JVM,一个Agent由Source、Channel、Sink组成。
Flume的安装:
1.创建flume目录:sudo mkdir -p flume
2.下载flume:sudo wget https://mirrors.tuna.tsinghua.edu.cn/apache/flume/1.8.0/apache-flume-1.8.0-bin.tar.gz
3.解压flume:tar -zxvf apache-flume-1.8.0-bin.tar.gz
4.切换到/usr/local/src/flume/apache-flume-1.8.0-bin/conf
5.执行sudo cp flume-env.sh.template flume-env.sh进行文件重命名
6.打开文件:sudo gedit flume-env.sh
7.编辑java环境变量:
如上:java_home=/usr/local/src/java
8.创建client.conf配置文件:sudo gedit client.conf
9.编辑client.conf
#Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /data/logs/quizzes/user.log
a1.sources.r1.channels = c1
# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = localhost
a1.sinks.k1.port = 4141
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
10.创建server.conf配置文件:sudo gedit server.conf
11.编辑:
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 4141
12.创建sink.conf配置文件:sudo gedit sink.conf
13.编辑:
a1.sinks.k1.type = light.flume.RollingFileFlumeSink
a1.sinks.k1.sink.directory = /data/logs/quizzes/all(文件被储存的目标路径)
a1.sinks.k1.channel = c1
a1.sinks.k1.sink.id = user
a1.sinks.k1.sink.filename = /data/logs/quizzes/all/api.log(目标文件夹)
a1.sinks.k1.sink.filepattern = /data/logs/quizzes/all/api-%d{yyyy-MM-dd}.log.gz(每天00:00进行压缩)
a1.channels.c1.type = memory
(可配,channels中的内存条数)
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
14.启动客户端:
/usr/local/src/flume/apache-flume-1.8.0-bin/bin/flume-ng agent -c . -f /usr/local/src/flume/apache-flume-1.8.0-bin/conf/client.conf -n a1 -Dflume.root.logger=INFO,console
15.启动服务端:
/usr/local/src/java/bin/java -Xmx20m -Dflume.root.logger=INFO,console -cp '/usr/local/src/flume:/usr/local/src/flume/apache-flume-1.8.0-bin/lib/*:/lib/*' -Djava.library.path= org.apache.flume.node.Application -f /usr/local/src/flume/apache-flume-1.8.0-bin/conf/client.conf -n a1