【问题标题】:multiplex the flow in flume into several channels将水槽中的流量多路复用到多个通道中
【发布时间】:2018-08-24 18:27:10
【问题描述】:

我想根据文件名将水槽中的流多路复用到多个通道中。怎么可能做到?我使用了假脱机目录源。我使用了频道选择器。它应该将流乘以事件标头中的文件名。

我有很多文件名为 CA,AZ,CA2,AZ2,....等等。CA 文件应该写入 /flume_sink/CA 目录,AZ 文件应该写入 /flume_sink/AZ 和 KT 是默认目录。使用以下代码。但它没有做多路复用。它正在复制。

配置有什么问题?

agent1.sinks=hdfs-sink1_1 hdfs-sink1_2 hdfs-sink1_3
agent1.sources=source1_1
agent1.channels=fileChannel1_1 fileChannel1_2 fileChannel1_3

agent1.channels.fileChannel1_1.type=file
agent1.channels.fileChannel1_1.capacity=200000
agent1.channels.fileChannel1_1.transactionCapacity=1000
agent1.channels.fileChannel1_1.checkpointDir=/home/Flume/alpha/001
agent1.channels.fileChannel1_1.dataDirs=/home/Flume/alpha_data
agent1.channels.fileChannel1_1.checkpointOnClose=true
agent1.channels.fileChannel1_1.dataOnClose=true


agent1.sources.source1_1.type=spooldir
agent1.sources.source1_1.spoolDir=/home/ABC/
agent1.sources.source1_1.recursiveDirectorySearch=true
#agent1.sources.source1_1.fileHeader=true
#agent1.sources.source1_1.fileHeaderKey=file
agent1.sources.source1_1.fileSuffix=.COMPLETED
agent1.sources.source1_1.basenameHeader = true
agent1.sources.source1_1.basenameHeaderKey = basename

agent1.sinks.hdfs-sink1_1.type=hdfs
agent1.sinks.hdfs-sink1_1.hdfs.filePrefix = %{basename}
agent1.sinks.hdfs-sink1_1.hdfs.path=hdfs://10.44.209.44:9000/flume_sink/CA
agent1.sinks.hdfs-sink1_1.hdfs.batchSize=1000
agent1.sinks.hdfs-sink1_1.hdfs.rollSize=268435456
agent1.sinks.hdfs-sink1_1.hdfs.rollInterval=0
agent1.sinks.hdfs-sink1_1.hdfs.rollCount=50000000
agent1.sinks.hdfs-sink1_1.hdfs.fileType=DataStream
agent1.sinks.hdfs-sink1_1.hdfs.writeFormat=Text
agent1.sinks.hdfs-sink1_1.hdfs.useLocalTimeStamp=false


agent1.channels.fileChannel1_2.type=file
agent1.channels.fileChannel1_2.capacity=200000
agent1.channels.fileChannel1_2.transactionCapacity=1000
agent1.channels.fileChannel1_2.checkpointDir=/home/Flume/beta/001
agent1.channels.fileChannel1_2.dataDirs=/home/Flume/beta_data
agent1.channels.fileChannel1_2.checkpointOnClose=true
agent1.channels.fileChannel1_2.dataOnClose=true



agent1.sinks.hdfs-sink1_2.type=hdfs
agent1.sinks.hdfs-sink1_2.hdfs.filePrefix = %{basename}
agent1.sinks.hdfs-sink1_2.hdfs.path=hdfs://10.44.209.44:9000/flume_sink/AZ
agent1.sinks.hdfs-sink1_2.hdfs.batchSize=1000
agent1.sinks.hdfs-sink1_2.hdfs.rollSize=268435456
agent1.sinks.hdfs-sink1_2.hdfs.rollInterval=0
agent1.sinks.hdfs-sink1_2.hdfs.rollCount=50000000
agent1.sinks.hdfs-sink1_2.hdfs.fileType=DataStream
agent1.sinks.hdfs-sink1_2.hdfs.writeFormat=Text
agent1.sinks.hdfs-sink1_2.hdfs.useLocalTimeStamp=false

agent1.channels.fileChannel1_3.type=file
agent1.channels.fileChannel1_3.capacity=200000
agent1.channels.fileChannel1_3.transactionCapacity=10
agent1.channels.fileChannel1_3.checkpointDir=/home/Flume/gamma/001
agent1.channels.fileChannel1_3.dataDirs=/home/Flume/gamma_data
agent1.channels.fileChannel1_3.checkpointOnClose=true
agent1.channels.fileChannel1_3.dataOnClose=true


agent1.sinks.hdfs-sink1_3.type=hdfs
agent1.sinks.hdfs-sink1_3.hdfs.filePrefix = %{basename}
agent1.sinks.hdfs-sink1_3.hdfs.path=hdfs://10.44.209.44:9000/flume_sink/KT
agent1.sinks.hdfs-sink1_3.hdfs.batchSize=1000
agent1.sinks.hdfs-sink1_3.hdfs.rollSize=268435456
agent1.sinks.hdfs-sink1_3.hdfs.rollInterval=0
agent1.sinks.hdfs-sink1_3.hdfs.rollCount=50000000
agent1.sinks.hdfs-sink1_3.hdfs.fileType=DataStream
agent1.sinks.hdfs-sink1_3.hdfs.writeFormat=Text
agent1.sinks.hdfs-sink1_3.hdfs.useLocalTimeStamp=false


agent1.sources.source1_1.channels=fileChannel1_1 fileChannel1_2 fileChannel1_3

agent1.sinks.hdfs-sink1_1.channel=fileChannel1_1
agent1.sinks.hdfs-sink1_2.channel=fileChannel1_2
agent1.sinks.hdfs-sink1_3.channel=fileChannel1_3


agent1.sources.source1_1.selector.type=replicating
agent1.sources.source1_1.selector.header=basename
agent1.sources.source1_1.selector.mapping.CA=fileChannel1_1
agent1.sources.source1_1.selector.mapping.AZ=fileChannel1_2
agent1.sources.source1_1.selector.default=fileChannel1_3

【问题讨论】:

    标签: flume-ng


    【解决方案1】:

    在假脱机目录源中尝试 fileHeader 和 fileHeaderKey 属性 https://flume.apache.org/FlumeUserGuide.html#spooling-directory-source

    【讨论】:

    • 我使用了 fileHeaderKey 和 fileHeader 属性,如下所示。 agent1.sources.source1_1.fileHeader=true agent1.sources.source1_1.fileHeaderKey=file 那么应该是agent1.sources.source1_1.selector.header=???
    猜你喜欢
    • 1970-01-01
    • 2015-10-09
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-10-12
    • 2017-03-17
    • 1970-01-01
    相关资源
    最近更新 更多