【发布时间】:2016-03-30 08:17:19
【问题描述】:
大家好,提前感谢您花时间阅读本文:) 我正在尝试在我的 Hadoop 集群中发送一个 JSON 对象以使用 Spark 处理它,这个 JSON 大约 15KB。我这样设置我的水槽代理:
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 41400
a1.sources.r1.max-line-length = 512000
a1.sources.r1.eventSize = 512000
#a1.sources.deserializer.maxLineLength = 512000
# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /hadoop/hdfs/data
a1.sinks.k1.hdfs.filePrefix = CDR
a1.sinks.k1.hdfs.callTimeout = 15000
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat = Text
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 226
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.batchSize = 226
# Use a channel which buffers events in memory
a1.channels.c1.type = file
a1.channels.c1.capacity = 512000
a1.channels.c1.transactionCapacity =512000
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
除此之外,我有一个 perl 的脚本,它通过指定端口的套接字发送 JSON 对象,但是当我启动水槽代理时,我收到以下消息:
WARN source.NetcatSource: Client sent event exceeding the maximum length
我不明白的是,我将事件的最大行长度设置为 512000 字节,大于 15 KB,有人可以帮助我吗? 感谢和抱歉我的英语不好
【问题讨论】:
标签: json hadoop apache-spark netcat flume-ng