TCP_NODELAY详解 - 爱码网

在网络拥塞控制领域，我们知道有一个非常有名的算法叫做Nagle算法（Nagle algorithm），这是使用它的发明人John Nagle的名字来命名的，John Nagle在1984年首次用这个算法来尝试解决福特汽车公司的网络拥塞问题（RFC 896），该问题的具体描述是：如果我们的应用程序一次产生1个字节的数据，而这个1个字节数据又以网络数据包的形式发送到远端服务器，那么就很容易导致网络由于太多的数据包而过载。比如，当用户使用Telnet连接到远程服务器时，每一次击键操作就会产生1个字节数据，进而发送出去一个数据包，所以，在典型情况下，传送一个只拥有1个字节有效数据的数据包，却要发费40个字节长包头（即ip头20字节+tcp头20字节）的额外开销，这种有效载荷（payload）利用率极其低下的情况被统称之为愚蠢窗口症候群（Silly Window Syndrome）。可以看到，这种情况对于轻负载的网络来说，可能还可以接受，但是对于重负载的网络而言，就极有可能承载不了而轻易的发生拥塞瘫痪。
针对上面提到的这个状况，Nagle算法的改进在于：如果发送端欲多次发送包含少量字符的数据包（一般情况下，后面统一称长度小于MSS的数据包为小包，与此相对，称长度等于MSS的数据包为大包，为了某些对比说明，还有中包，即长度比小包长，但又不足一个MSS的包），则发送端会先将第一个小包发送出去，而将后面到达的少量字符数据都缓存起来而不立即发送，直到收到接收端对前一个数据包报文段的ACK确认、或当前字符属于紧急数据，或者积攒到了一定数量的数据（比如缓存的字符数据已经达到数据包报文段的最大长度）等多种情况才将其组成一个较大的数据包发送出去，具体有哪些情况，我们来看看内核实现：
1383: Filename : \linux-3.4.4\net\ipv4\tcp_output.c
1384: /* Return 0, if packet can be sent now without violation Nagle's rules:
1385:    * 1. It is full sized.
1386:    * 2. Or it contains FIN. (already checked by caller)
1387:    * 3. Or TCP_CORK is not set, and TCP_NODELAY is set.
1388:    * 4. Or TCP_CORK is not set, and all sent packets are ACKed.
1389:    * With Minshall's modification: all sent small packets are ACKed.
1390:    */
1391: static inline int tcp_nagle_check(const struct tcp_sock *tp,
1392:    const struct sk_buff *skb,
1393:    unsigned mss_now, int nonagle)
1394: {
1395: return skb->len < mss_now &&
1396: ((nonagle & TCP_NAGLE_CORK) ||
1397:    (!nonagle && tp->packets_out && tcp_minshall_check(tp)));
1398: }
1399:
1400: /* Return non-zero if the Nagle test allows this packet to be
1401:    * sent now.
1402:    */
1403: static inline int tcp_nagle_test(const struct tcp_sock *tp, const struct sk_buff *skb,
1404:    unsigned int cur_mss, int nonagle)
1405: {
1406: /* Nagle rule does not apply to frames, which sit in the middle of the
1407:    * write_queue (they have no chances to get new data).
1408:    *
1409:    * This is implemented in the callers, where they modify the 'nonagle'
1410:    * argument based upon the location of SKB in the send queue.
1411:    */
1412: if (nonagle & TCP_NAGLE_PUSH)
1413: return 1;
1414:
1415: /* Don't use the nagle rule for urgent data (or for the final FIN).
1416:    * Nagle can be ignored during F-RTO too (see RFC413 TCP_NODELAY详解 .
1417:    */
1418: if (tcp_urg_mode(tp) || (tp->frto_counter == 2) ||
1419:       (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN))
1420: return 1;
1421:
1422: if (!tcp_nagle_check(tp, skb, cur_mss, nonagle))
1423: return 1;
1424:
1425: return 0;
1426: }
这一段Linux内核代码非常容易看，因为注释代码足够的多。从函数tcp_nagle_test()看起，第1412行是直接进行参数判断，如果在外部（也就是调用者）主动设置了TCP_NAGLE_PUSH旗标，比如主动禁止Nagle算法或主动拔走塞子（下一节TCP_CORK内容）或明确是连接最后一个包（比如连接close()前发出的数据包），此时当然是返回1从而把数据包立即发送出去；第1418-1420行代码处理的是特殊包，也就是紧急数据包、带FIN旗标的结束包以及带F-RTO旗标的包；第1422行进入到tcp_nagle_check()函数进行判断，该函数的头注释有点混乱而不太清楚，我再逐句代码解释一下，首先要看明白如果该函数返回1，则表示该数据包不立即发送；再看具体实现就是：skb->len < mss_now为真表示如果包数据长度小于当前MSS；nonagle & TCP_NAGLE_CORK为真表示当前已主动加塞或明确标识立即还会有数据过来（内核表示为MSG_MORE）；!nonagle为真表示启用Nagle算法；tp->packets_out为真表示存在有发出去的数据包没有被ACK确认；tcp_minshall_check(tp)是Nagle算法的改进，先直接认为它与前一个判断相同，具体后续再讲。把这些条件按与或组合起来就是：如果包数据长度小于当前MSS &&（（加塞、有数据过来）||（启用Nagle算法 && 存在有发出去的数据包没有被ACK确认）），那么缓存数据而不立即发送。
http://lenky.info/ebook/