1.下载flume1.6
https://mirrors.tuna.tsinghua.edu.cn/apache/flume/1.6.0/apache-flume-1.6.0-bin.tar.gz
2.安装jdk和Hadoop
具体参照以前wen'文章
3.flume 配置文件修改
修改conf目录下的flume-env.sh文件
export JAVA_HOME=/etc/java/jdk/jdk1.8/
4.编写采集配置文件将采集结果存入hdfs
在conf目录下编辑flume-conf-hdfs.properties文件如下:
#####################################################################
## 监听目录中的新增文件
## this agent is consists of source which is r1 , sinks which is k1,
## channel which is c1
##
## 这里面的a1 是flume一个实例agent的名字
#####################################################################
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# 监听数据源的方式,这里采用监听目录中的新增文件
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /home/flume/test
a1.sources.r1.fileSuffix = .ok
# a1.sources.r1.deletePolicy = immediate
a1.sources.r1.deletePolicy = never
a1.sources.r1.fileHeader = true
# 采集的数据的下沉(落地)方式 通过日志
#a1.sinks.k1.type = logger
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.useLocalTimeStamp=true
a1.sinks.k1.hdfs.path=hdfs://localhost:9000/flume-dir/%Y%m%d%H%M%S
a1.sinks.k1.hdfs.filePrefix=log
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=TEXT
a1.sinks.k1.hdfs.rollInterval=10
a1.sinks.k1.hdfs.rollCount=0
a1.sinks.k1.hdfs.rollSize
# 描述channel的部分,使用内存做数据的临时存储
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 使用channel将source和sink连接起来
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
5.执行启动flume agent a1
./flume-ng agent -c conf -n a1 -f ../conf/flume-conf-hdfs.properties -Dflume.root.logger=INFO,console
6.操作相应的文件,生成采集的日志
在文件夹/home/flume/test执行如下操作ming命令:
echo 'this is lys flume test'>flume.txt
文件下生成了文件flume.txt.ok
并且flume打印日志如下:
18/09/28 22:33:23 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 started
18/09/28 22:33:23 INFO source.SpoolDirectorySource: SpoolDirectorySource source starting with directory: /home/flume/test
18/09/28 22:33:23 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: r1: Successfully registered new MBean.
18/09/28 22:33:23 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: r1 started
18/09/28 22:34:22 INFO avro.ReliableSpoolingFileEventReader: Last read took us just up to a file boundary. Rolling to the next file, if there is one.
18/09/28 22:34:22 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /home/flume/test/flume.txt to /home/flume/test/flume.txt.ok
18/09/28 22:34:25 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
18/09/28 22:34:25 INFO hdfs.BucketWriter: Creating hdfs://localhost:9000/flume-dir/20180928223425/log.1538188465593.tmp
18/09/28 22:34:37 INFO hdfs.BucketWriter: Closing hdfs://localhost:9000/flume-dir/20180928223425/log.1538188465593.tmp
18/09/28 22:34:37 INFO hdfs.BucketWriter: Renaming hdfs://localhost:9000/flume-dir/20180928223425/log.1538188465593.tmp to hdfs://localhost:9000/flume-dir/20180928223425/log.1538188465593
18/09/28 22:34:37 INFO hdfs.HDFSEventSink: Writer callback called
代表日志导入hdfs成功,并且在hdfs下可以看到wen'文件