【发布时间】:2021-06-22 15:08:33
【问题描述】:
我正在尝试设计 Kafka 消费者,但我在如何设计流程方面遇到了障碍。我正在考虑两种选择:
1. Process records directly from Kafka.
2. Staging table write from Kafka and process records.
方法 1: 随时随地处理来自 Kafka 的关键消息:
• Read messages one at a time from Kafka & if no records to process break the loop (configurable messages to process)
• Execute business rules.
• Apply changes to consumer database.
• Update Kafka offset to read after processing message.
• Insert into staging table (used for PD guide later on)
上述方法的问题:
• Is it OK to subscribe to a partition and keep the lock open on Kafka partition until configurable messages are processed
and then apply business rules, apply changes to database. All happens in the same process, any performance issues doing this way ?
• Is it OK to manually commit the offset to Kafka? (Performance issues with manual offset commit).
方法 2: 从 Kafka 写入暂存表并处理记录
Process 1: Consuming events from Kafka and put in staging table.
Process 2: Reading staging table (configurable rows), execute business rules, apply consumer database changes
& update the status of processed records in staging table. (we may have multiple process to do this step)
我认为这种方法有很多缺点:
• We are missing the advantage of offset handling provided by Kafka and we are doing manual update of processed records in staging table.
• Locking & Blocking on staging tables for multi instance, as we are trying to insert & do updates after processing in the same staging table
(note: I can design separate tables and move this data there and process them but that could is introducing multiple processes again.
如何设计具有多实例消费者和要处理的大量数据的 Kafka,哪种设计是合适的,从 Kafka 中读取数据并处理消息或将其暂存到表中并编写另一个作业来处理是否好这些消息?
【问题讨论】:
标签: apache-kafka kafka-consumer-api