【问题标题】:Lagged offsets skipped after new event is published before max poll interval in KAFKA在 KAFKA 中的最大轮询间隔之前发布新事件后跳过滞后偏移量
【发布时间】:2021-07-08 17:19:18
【问题描述】:

Kafka v2.4 消费者配置:-

kafka.consumer.auto.offset.reset=earliest
kafka.consumer.auto.commit=false

Kafka 消费者容器配置:-

@Bean
public ConcurrentKafkaListenerContainerFactory<String, PayoutDto> kafkaPayoutStatusPoolListenerContainerFactory() {
    ConcurrentKafkaListenerContainerFactory<String, PayoutDto> factory = new ConcurrentKafkaListenerContainerFactory<>();
    factory.setConsumerFactory(kafkaConsumerFactoryForPayoutEvent());
    factory.getContainerProperties().setAckMode(AckMode.MANUAL_IMMEDIATE);
    factory.setMissingTopicsFatal(false);
    return factory;
}

卡夫卡消费者:-

@KafkaListener(id = "regularPayoutEventConsumer", topics = "${kafka.regular.payout.consumer.queuename}", containerFactory = "kafkaPayoutStatusPoolListenerContainerFactory", groupId = "${kafka.regular.payout.consumer.groupId}")
public void listen(ConsumerRecord<String, PayoutDto> consumerRecord, Acknowledgment ack) {
    StopWatch watch = new StopWatch();
    watch.start();
    String key = null;
    Long offset = null;
    try {
        PayoutDto payoutDto = consumerRecord.value();
        key = consumerRecord.key();
        offset = consumerRecord.offset();
        cpAccountsService.processPayoutEvent(payoutDto);
        ack.acknowledge();
    } catch (Exception e) {
        log.error("Exception occured in RegularPayoutEventConsumer due to following issue {}", e);
    } finally {
        watch.stop();
        log.debug("tolal time taken by consumer for requestID:" + key + " on offset:" + offset + " is:"
                + watch.getTotalTimeMillis());
    }

}

成功场景:-

  1. 消费者未能确认导致延迟的异常,假设上次提交的偏移量为 30,现在延迟为 4。
  2. 在轮询间隔后的下一个自动轮询周期中,消费者继续消费,其中延迟从 30 开始,通常在 33 结束,现在延迟为 0。

失败的场景:-

  1. 与成功场景中的第 1 步相同。
  2. 现在在消费者轮询间隔之前,生产者推送了新消息。
  3. 现在在新的生产者事件中,消费者拉取数据并直接跳转到偏移记录 33 并跳过 30、31、32 并将延迟清除为 0。

kafka的应用启动日志:-

        2021-04-14 10:38:06.132  INFO 10286 --- [  restartedMain] o.a.k.clients.consumer.KafkaConsumer     : [Consumer clientId=consumer-RegularPayoutEventGroupId-3, groupId=RegularPayoutEventGroupId] Subscribed to topic(s): InstantPayoutTransactionsEv
    2021-04-14 10:38:06.132  INFO 10286 --- [  restartedMain] o.s.s.c.ThreadPoolTaskScheduler          : Initializing ExecutorService
    2021-04-14 10:38:06.133  INFO 10286 --- [  restartedMain] o.a.k.clients.consumer.ConsumerConfig    : ConsumerConfig values: 
        allow.auto.create.topics = true
        auto.commit.interval.ms = 5000
        auto.offset.reset = earliest
        bootstrap.servers = [localhost:9092]
        check.crcs = true
        client.dns.lookup = use_all_dns_ips
        client.id = consumer-PayoutEventGroupId-4
        client.rack = 
        connections.max.idle.ms = 540000
        default.api.timeout.ms = 60000
        enable.auto.commit = false
        exclude.internal.topics = true
        fetch.max.bytes = 52428800
        fetch.max.wait.ms = 500
        fetch.min.bytes = 1
        group.id = PayoutEventGroupId
        group.instance.id = null
        heartbeat.interval.ms = 3000
        interceptor.classes = []
        internal.leave.group.on.close = true
        internal.throw.on.fetch.stable.offset.unsupported = false
        isolation.level = read_uncommitted
        key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
        max.partition.fetch.bytes = 1048576
        max.poll.interval.ms = 30000
        max.poll.records = 500
        metadata.max.age.ms = 300000
        metric.reporters = []
        metrics.num.samples = 2
        metrics.recording.level = INFO
        metrics.sample.window.ms = 30000
        partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
        receive.buffer.bytes = 65536
        reconnect.backoff.max.ms = 1000
        reconnect.backoff.ms = 50
        request.timeout.ms = 30000
        retry.backoff.ms = 100
        sasl.client.callback.handler.class = null
        sasl.jaas.config = null
        sasl.kerberos.kinit.cmd = /usr/bin/kinit
        sasl.kerberos.min.time.before.relogin = 60000
        sasl.kerberos.service.name = null
        sasl.kerberos.ticket.renew.jitter = 0.05
        sasl.kerberos.ticket.renew.window.factor = 0.8
        sasl.login.callback.handler.class = null
        sasl.login.class = null
        sasl.login.refresh.buffer.seconds = 300
        sasl.login.refresh.min.period.seconds = 60
        sasl.login.refresh.window.factor = 0.8
        sasl.login.refresh.window.jitter = 0.05
        sasl.mechanism = GSSAPI
        security.protocol = PLAINTEXT
        security.providers = null
        send.buffer.bytes = 131072
        session.timeout.ms = 10000
        ssl.cipher.suites = null
        ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
        ssl.endpoint.identification.algorithm = https
        ssl.engine.factory.class = null
        ssl.key.password = null
        ssl.keymanager.algorithm = SunX509
        ssl.keystore.location = null
        ssl.keystore.password = null
        ssl.keystore.type = JKS
        ssl.protocol = TLSv1.3
        ssl.provider = null
        ssl.secure.random.implementation = null
        ssl.trustmanager.algorithm = PKIX
        ssl.truststore.location = null
        ssl.truststore.password = null
        ssl.truststore.type = JKS
        value.deserializer = class com.cms.cpa.config.KafkaPayoutDeserializer

    2021-04-14 10:38:06.137  INFO 10286 --- [  restartedMain] o.a.kafka.common.utils.AppInfoParser     : Kafka version: 2.6.0
    2021-04-14 10:38:06.137  INFO 10286 --- [  restartedMain] o.a.kafka.common.utils.AppInfoParser     : Kafka commitId: 62abe01bee039651

【问题讨论】:

  • 听起来难以置信——你需要展示更多的代码和配置;日志等。可以使用commitLogLevel 容器属性在更高级别记录提交的偏移量。
  • @GaryRussell,我已经更新了所需的详细信息。
  • @GaryRussell,为了更好地解释问题,我创建了一个 git repo,请参考,github.com/rohandodeja/kafka-test-app
  • 我已经尝试过您的应用程序,它对我来说可以正常工作。我可以得到 4 延迟的唯一方法是使用 data=1 发送 4 条不良记录。然后,当我发送一个新的好记录时,它会立即收到,正如预期的那样。不良记录不会随您的配置重新交付。也许我误解了你的期望。
  • @GaryRussell,抛出异常只是获取异常并产生滞后的示例,就像我的真实应用程序行为一样,主要问题是当您产生良好记录时,那时消费者将直接消费您推送的最后一条好记录,但我期望的行为会是,它应该首先处理旧记录然后处理新记录,但是正如您所看到的,您只获得最新记录并且滞后直接无效。

标签: apache-kafka kafka-consumer-api spring-kafka


【解决方案1】:

Kafka 为消费者/分区维护 2 个值 - 已提交的偏移量(如果重新启动,消费者将从那里开始)和 position - 将在下一次轮询时返回该记录。

不确认记录不会导致位置重新定位。

它按设计工作;如果你想重新处理失败的记录,你需要使用acknowledgment.nack()和一个可选的休眠时间,或者抛出异常并配置一个SeekToCurrentErrorHandler

在这些情况下,容器将重新定位分区,以便重新传递失败的记录。使用错误处理程序,您可以在重试用尽后“恢复”失败的记录。使用nack() 时,侦听器必须跟踪尝试。

https://docs.spring.io/spring-kafka/docs/current/reference/html/#committing-offsets

https://docs.spring.io/spring-kafka/docs/current/reference/html/#annotation-error-handling

【讨论】:

    猜你喜欢
    • 2015-10-13
    • 2022-11-24
    • 2018-09-24
    • 1970-01-01
    • 2021-08-16
    • 2018-06-28
    • 2021-03-09
    • 1970-01-01
    • 2020-09-06
    相关资源
    最近更新 更多