为什么在写入端关闭之前读取会阻塞在管道上？答案

【问题标题】：Why does read block on a pipe until the write end is closed?为什么在写入端关闭之前读取会阻塞在管道上？
【发布时间】：2020-10-01 23:17:32
【问题描述】：

我试图通过编写以下popen-type 函数来加深我对与fork、exec、dup 和重定向stdin/stdout/stderr 相关的事物的理解：

// main.c
#include <pthread.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>

#define INVALID_FD (-1)

typedef enum PipeEnd {
  READ_END  = 0,
  WRITE_END = 1
} PipeEnd;

typedef int Pipe[2];

/** Encapsulates information about a created child process. */
typedef struct popen2_t {
  bool  success;  ///< true if the child process was spawned.
  Pipe  stdin;    ///< parent -> stdin[WRITE_END] -> child's stdin
  Pipe  stdout;   ///< child -> stdout[WRITE_END] -> parent reads stdout[READ_END]
  Pipe  stderr;   ///< child -> stderr[WRITE_END] -> parent reads stderr[READ_END]
  pid_t pid;      ///< child process' pid
} popen2_t;

/** dup2( p[pe] ) then close and invalidate both ends of p */
static void dupFd( Pipe p, const PipeEnd pe, const int fd ) {
  dup2( p[pe], fd);
  close( p[READ_END] );
  close( p[WRITE_END] );
  p[READ_END] = INVALID_FD;
  p[WRITE_END] = INVALID_FD;
}

popen2_t popen2( const char* cmd ) {
  popen2_t r = { false, { INVALID_FD, INVALID_FD } };

  if ( -1 == pipe( r.stdin ) ) { goto end; }
  if ( -1 == pipe( r.stdout ) ) { goto end; }
  if ( -1 == pipe( r.stderr ) ) { goto end; }

  switch ( (r.pid = fork()) ) {
    case -1: // Error
      goto end;

    case 0: // Child process
      dupFd( r.stdin, READ_END, STDIN_FILENO );
      dupFd( r.stdout, WRITE_END, STDOUT_FILENO );
      dupFd( r.stderr, WRITE_END, STDERR_FILENO );

      {
        char* argv[] = { "sh", "-c", (char*)cmd, NULL };

        if ( -1 == execvp( argv[0], argv ) ) { exit(0); }
      }
  }

  // Parent process
  close( r.stdin[READ_END] );
  r.stdin[READ_END] = INVALID_FD;
  close( r.stdout[WRITE_END] );
  r.stdout[WRITE_END] = INVALID_FD;
  close( r.stderr[WRITE_END] );
  r.stderr[WRITE_END] = INVALID_FD;
  r.success = true;

end:
  if ( ! r.success ) {
    if ( INVALID_FD != r.stdin[READ_END] ) { close( r.stdin[READ_END] ); }
    if ( INVALID_FD != r.stdin[WRITE_END] ) { close( r.stdin[WRITE_END] ); }
    if ( INVALID_FD != r.stdout[READ_END] ) { close( r.stdout[READ_END] ); }
    if ( INVALID_FD != r.stdout[WRITE_END] ) { close( r.stdout[WRITE_END] ); }
    if ( INVALID_FD != r.stderr[READ_END] ) { close( r.stderr[READ_END] ); }
    if ( INVALID_FD != r.stderr[WRITE_END] ) { close( r.stderr[WRITE_END] ); }

    r.stdin[READ_END] = r.stdin[WRITE_END] =
      r.stdout[READ_END] = r.stdout[WRITE_END] =
      r.stderr[READ_END] = r.stderr[WRITE_END] = INVALID_FD;
  }

  return r;
}

int main( int argc, char* argv[] ) {
  popen2_t p = popen2( "./child.out" );

  {
    int status = 0;


    sleep( 2 );

    {
      char buf[1024] = { '\0' };

      read( p.stdout[READ_END], buf, sizeof buf );
      printf( "%s", buf );
    }

    //pid_t wpid = waitpid( p.pid, &status, 0 );
    //return wpid == p.pid && WIFEXITED( status ) ? WEXITSTATUS( status ) : -1;
  }
}

// child.c
#include <stdio.h>
#include <unistd.h>

int main( int argc, char* argv[] ) {
  printf( "%s:%d\n", __FILE__, __LINE__ );
  sleep( 1 );
  printf( "%s:%d\n", __FILE__, __LINE__ );
  sleep( 1 );
  printf( "%s:%d\n", __FILE__, __LINE__ );
  sleep( 1 );
  printf( "%s:%d\n", __FILE__, __LINE__ );
  sleep( 1 );
  return 0;
}

编译和执行：

$ gcc --version && gcc -g ./child.c -o ./child.out && gcc -g ./main.c && ./a.out
gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

./child.c:6
./child.c:8
./child.c:10
./child.c:12
$

我的问题是关于read() - 我不太明白为什么read() 在子进程完成之前似乎是阻塞的（从而关闭它的管道末端）？

是巧合吗？你可以看到我已经尝试使用sleep( 2 ) 语句“使”主进程在子进程执行过程中进行读取。

子进程总共将 50 个字符转储到其（重定向的）标准输出。 难道主进程可能会在子进程执行过程中执行其read() 并且只读取其中 50 个字符中的 N 个，因此主进程的 printf() 不会打印所有整个子进程的四行？

（功能方面，一切都很好 - 我的问题是为了更好地理解read()）

【问题讨论】：

你认为它还能做什么？
@NateEldredge - 我不能说这是合理，但我在想象管道可能就像一个 TCP 套接字，如果有 anything 在管道中，你可以读取它的一些子集，而忽略另一端是否完成。
@StoneThrow：这正是发生的事情。但是孩子正在使用printf进行打印，当标准输出不是终端时，它是完全缓冲的。所以尽管 printf 完成了，write() 直到缓冲区被刷新才真正被调用，这发生在子退出时。
@StoneThrow：确实如此。在这种情况下，您将看到read 立即返回，但有多少字节可用。（您必须检查返回值以查看有多少字节。）如果您想要更多字节，请再次调用read()，通常在一个循环中，它将阻塞直到有一些字节可用或管道可用关闭。
在所有正常情况下，您都希望以这种方式继续阅读。就您的程序而言，父级将在子级完成写入之前关闭管道，这将导致子级在写入更多内容时被 SIGPIPE 杀死。

标签： c linux pipe fork stdout

【解决方案1】：

默认情况下，stdout 在不写入终端时是完全缓冲的。因此，在刷新缓冲区之前，您的 printf() 调用不会将任何内容写入管道。当缓冲区填满（可能是 1K 或 4K 字节）或进程退出时，就会发生这种情况。

您可以使用fflush(stdout); 立即刷新缓冲区。在您的每个 printf() 调用之后添加它，您将能够在父级中读取它们而无需等待进程退出。

【讨论】：

也谢谢你 - 与 NateEldredge 的回复相同 - 我将根据你的解释进行实验。
能够通过实验确认-谢谢您的解释！