【问题标题】:sink.hdfs writer adds garbage in my text filesink.hdfs writer 在我的文本文件中添加了垃圾
【发布时间】:2014-10-05 05:42:55
【问题描述】:

我已成功配置 Flume 以将文本文件从本地文件夹传输到 hdfs。我的问题是当这个文件被传输到 hdfs 时,一些不需要的文本“hdfs.write.Longwriter + 二进制字符”会在我的文本文件中添加前缀。 这是我的flume.conf

agent.sources = flumedump
agent.channels = memoryChannel
agent.sinks = flumeHDFS

agent.sources.flumedump.type = spooldir
agent.sources.flumedump.spoolDir = /opt/test/flume/flumedump/
agent.sources.flumedump.channels = memoryChannel

# Each sink's type must be defined
agent.sinks.flumeHDFS.type = hdfs
agent.sinks.flumeHDFS.hdfs.path = hdfs://bigdata.ibm.com:9000/user/vin
agent.sinks.flumeHDFS.fileType = DataStream

#Format to be written
agent.sinks.flumeHDFS.hdfs.writeFormat = Text

agent.sinks.flumeHDFS.hdfs.maxOpenFiles = 10
# rollover file based on maximum size of 10 MB
agent.sinks.flumeHDFS.hdfs.rollSize = 10485760

# never rollover based on the number of events
agent.sinks.flumeHDFS.hdfs.rollCount = 0

# rollover file based on max time of 1 mi
agent.sinks.flumeHDFS.hdfs.rollInterval = 60


#Specify the channel the sink should use
agent.sinks.flumeHDFS.channel = memoryChannel

# Each channel's type is defined.
agent.channels.memoryChannel.type = memory

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
agent.channels.memoryChannel.capacity = 100

我的源文本文件非常简单,包含文本: 嗨,我的名字是 Hadoop,这是文件一。

我在 hdfs 中获得的接收器文件如下所示: SEQ !org.apache.hadoop.io.LongWritable org.apache.hadoop.io.Text������5����>I

请让我知道我做错了什么?

【问题讨论】:

    标签: hadoop flume flume-ng


    【解决方案1】:

    想通了。 我必须修复这条线

    agent.sinks.flumeHDFS.fileType = DataStream

    并将其更改为

    agent.sinks.flumeHDFS.hdfs.fileType = DataStream

    这解决了问题。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-06-17
      • 2015-01-15
      • 1970-01-01
      • 2020-08-22
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多