【发布时间】:2020-09-19 21:56:32
【问题描述】:
抱歉,如果这里之前已经介绍过,我找不到任何密切相关的内容。我有这个 Kafka Streams 应用程序,它从多个主题中读取数据,将记录保存在数据库中,然后将事件发布到输出主题。非常简单,就 kafka 本地商店而言,它是无国籍的。 (如下拓扑)
Topic1(T1) 有 5 个分区,Topic2(T2) 有一个分区。这里的问题是,在消费两个主题时,如果我想用 T1(5 个消费者)“全速”运行,它不能保证我将为 T1 上的每个分区都有专门的消费者。它将分布在两个主题分区中,我最终可能会遇到不平衡的消费者(和空闲消费者),如下所示:
- [c1: t1p1, t1p3], [c2: t1p2, t1p5], [c3: t1p4, t2p1], [c4: (idle consumer)], [c5: (idle consumer)]
- [c1: t1p1, t1p2], [c2: t1p5], [c3: t1p4, t2p1], [c4: (空闲消费者)], [c5: t1p3]
话虽如此:
从同一个 KafkaStreams 实例中的多个主题读取的拓扑是否是一种好习惯?
如果我想在 T1 上“全速”运行,有什么方法可以实现如下分区分配? [c1: t1p1, t2p1], [c2: t1p2], [c3: t1p3], [c4: t1p4], [c5: t1p5]
以下哪种拓扑最适合我想要实现的目标?还是完全不相关?
选项 A(当前拓扑)
Topologies:
Sub-topology: 0
Source: topic1-source (topics: [TOPIC1])
--> topic1-processor
Processor: topic1-processor (stores: [])
--> topic1-sink
<-- topic1-source
Sink: topic1-sink (topic: OUTPUT-TOPIC)
<-- topic1-processor
Sub-topology: 1
Source: topic2-source (topics: [TOPIC2])
--> topic2-processor
Processor: topic2-processor (stores: [])
--> topic2-sink
<-- topic2-source
Sink: topic2-sink (topic: OUTPUT-TOPIC)
<-- topic2-processor
选项 B:
Topologies:
Sub-topology: 0
Source: topic1-source (topics: [TOPIC1])
--> topic1-processor
Source: topic2-source (topics: [TOPIC2])
--> topic2-processor
Processor: topic1-processor (stores: [])
--> response-sink
<-- topic1-source
Processor: topic2-processor (stores: [])
--> response-sink
<-- topic2-source
Sink: response-sink (topic: OUTPUT-TOPIC)
<-- topic2-processor, topic1-processor
- 如果我为每个主题使用两个流,而不是一个具有多个主题的流,这对我想要实现的目标有用吗?
config1.put("application.id", "app1");
KakfaStreams stream1 = new KafkaStreams(config1, topologyTopic1);
stream1.start();
config2.put("application.id", "app2");
KakfaStreams stream2 = new KafkaStreams(config2, topologyTopic2);
stream2.start();
【问题讨论】: