kafka producer batch expired TimeoutException： KAFKA-5621、KIP-91(Provide Intuitive User Timeouts in The Producer)、KAFKA-5886

Today, when a batch is expired in the accumulator, a TimeoutException is raised to the user.

It might be better the producer to retry the expired batch rather up to the configured number of retries. This is more intuitive from the user's point of view.

Further the proposed behavior makes it easier for applications like mirror maker to provide ordering guarantees even when batches expire. Today, they would resend the expired batch and it would get added to the back of the queue, causing the output ordering to be different from the input ordering.

We propose adding a new timeout delivery.timeout.ms. The window of enforcement includes batching in the accumulator, retries, and the inflight segments of the batch. With this config, the user has a guaranteed upper bound on when a record will either get sent, fail or expire from the point when send returns. In other words we no longer overload request.timeout.ms to act as a weak proxy for accumulator timeout and instead introduce an explicit timeout that users can rely on without exposing any internals of the producer such as the accumulator.

See KIP-91 for more details.

Current state: Adopted

Discussion thread: [DISCUSS] KIP-91

Vote thread: [VOTE ] KIP-91

JIRA: KAFKA-5886

Release: 2.1.0

Please keep the discussion on the mailing list rather than commenting on the wiki (wiki discussions get unwieldy fast).

Motivation

In KIP-19, we added a request timeout to the network client. This change was necessary primarily to bound the time to detection of broker failures. In the absence of such a timeout, the producer would learn of the failure only much later (typically several minutes depending on the TCP timeout) during which the accumulator could fill up and cause requests to either block or get dropped depending on the block.on.buffer.full configuration. One additional goal of KIP-19 was to make timeouts intuitive. It is important for users to be provided with a guarantee on the maximum duration from when the call to send returns and when the callback fires (or future is ready). Notwithstanding the fact that intuition is a subjective thing, we will see shortly that this goal has not been met.

In order to clarify the motivation, it will be helpful to review the lifecycle of records and record-batches in the producer, where the timeouts apply, and changes that have been made since KIP-19.

KIP-19

The initial call to send can block up to max.block.ms either waiting on metadata or for available space in the producer's accumulator. After this the record is placed in a (possibly new) batch of records.
The batch is eligible to be considered for sending when either linger.ms or batch.size bytes has been reached, whichever comes first. Although the batch is ready, it does not necessarily mean it can be sent out to the broker.
The batch has to wait for a transmission opportunity to the broker. A ready batch can only be sent out if the leader broker is in a sendable state (i.e., if a connection exists, current inflight requests are less than max.inflight.requests, etc.). In KIP-19, we use the request.timeout.ms configuration to expire requests in the accumulator as well. This was done in order to avoid an additional timeout, especially one that exposes the producer's internals to the user. The clock starts ticking when the batch is ready. However, we added a condition that if the metadata for a partition is known (i.e., it is possible to make progress on the partition) then we do not expire its batches even if they are ready. In other words, it is difficult to precisely determine the duration spent in the accumulator. Note that KIP-19 claims that "The per message timeout is easy to compute - linger.ms + (retries + 1) * request.timeout.ms". This is false.
When the batch gets sent out on the wire, we reset the clock for the actual wire timeout request.timeout.ms.
If the request fails for some reason before the timeout and we have retries remaining, we reset the clock again. (i.e., each retry gets a full request.timeout.ms.)

The following figure illustrates the above phases. The red circles are the potential points of timeout.