【发布时间】:2016-04-01 03:59:43
【问题描述】:
我有一个在 Ubuntu 工作站上运行的 Flume 1.5 代理,它从各种设备收集日志并将日志重新格式化为一个逗号分隔的文件,其中的行很长。在收集和重新格式化日志之后,它们被放置到一个假脱机目录中,Flume 代理将日志文件发送到运行 Flume 代理的 Hadoop 服务器以接受日志文件并将它们放置在 HDFS 目录中。
一切正常,除了 Flume 将文件发送到 HDFS 目录时,每行每 2048 个字符后都有换行符。
下面是我的水槽配置文件。 是否有设置告诉水槽不插入换行符?
#On Ubuntu Workstation
#list sources, sinks and channels in the agent
agent.sources = axon_source
agent.channels = memorychannel
agent.sinks = AvroOut
#define flow
agent.sources.axon_source.channels = memorychannel
agent.sinks.AvroOut.channel = memorychannel
agent.channels.memorychannel.type = memory
agent.channels.memorychannel.capacity = 100000
#source
agent.sources.axon_source.type = spooldir
agent.sources.axon_source.spoolDir = /home/ubuntu/workspace/logdump
agent.sources.axon_source.decodeErrorPolicy = ignore
#avro out
agent.sinks.AvroOut.type = avro
agent.sinks.AvroOut.hostname = 172.31.12.221
agent.sinks.AvroOut.port = 41415
agent.sinks.AvroOut.maxIoWorkers = 2
------------------------------------------------------------
#On Hadoop Server
agent.sources = AvroIn
agent.sources.AvroIn.type = avro
agent.sources.AvroIn.bind = 172.31.131.1
agent.sources.AvroIn.port = 41415
agent.sources.AvroIn.channels = MemChan1
agent.channels = MemChan1
agent.channels.MemChan1.type = memory
agent.channels.MemChan1.capacity = 100000
agent.sinks = HDFSSink
agent.sinks.HDFSSink.type = hdfs
agent.sinks.HDFSSink.channel = MemChan1
agent.sinks.HDFSSink.hdfs.path = /Logs/%Y%m/
agent.sinks.HDFSSink.hdfs.filePrefix = axoncapture
agent.sinks.HDFSSink.hdfs.fileSuffix = .log
agent.sinks.HDFSSink.hdfs.minBlockReplicas = 1
agent.sinks.HDFSSink.hdfs.rollCount = 0
agent.sinks.HDFSSink.hdfs.rollSize = 314572800
agent.sinks.HDFSSink.hdfs.writeFormat = Text
agent.sinks.HDFSSink.hdfs.fileType = DataStream
agent.sinks.HDFSSink.hdfs.useLocalTimeStamp = True
【问题讨论】: