Pthreads概述 - 爱码网

什么是Pthreads？

在过去，硬件提供商会去实现线程的硬件专用版本。这些线程的实现彼此会有很大的差异，所以会使得程序员开发可移植的线程应用程序非常困难。

为了充分利用线程的特性，我们需要一个标准的线程编程接口：

1. 对于UNIX系统，这个接口已经被IEEE POSIX 1003.1c标准（1995）所指定；
2. 这套标准的实现被称为POSIX threads，或者Pthreads；
3. 现在大部分的硬件提供商都提供他们专有的API之外，还会有Pthreads库。

POSIX标准在不停的进化和修改，包括Pthreads的规范。它的最新版本是IEEE Std 1003.1,2004版本。

一些有用的链接：

1. POSIX FAQs: www.opengroup.org/austin/papers/posix_faq.html
2. Download the Standard: www.unix.org/version3/ieee_std.html

Pthreads库被定义为一系列的C语言程序类型和过程调用，是用一个pthreads.h的include头文件和一个线程库（尽管这个库是另一个库的一部分，就像libc一样）来实现的。

为什么要用Pthreads？

使用Pthreads的主要的目的是，它使获得潜在的程序执行性能变成现实；
当与创建和管理进程的代价相比较时，线程被创建时只需要更少的系统开支，管理线程比管理进程需要更少的系统资源；

例如，下面的表格比较的是fork()与pthreads_create()：

Platform	`fork()`			`pthread_create()`
Platform	real	user	sys	real	user	sys
AMD 2.3 GHz Opteron (16cpus/node)	12.5	1.0	12.5	1.2	0.2	1.3
AMD 2.4 GHz Opteron (8cpus/node)	17.6	2.2	15.7	1.4	0.3	1.3
IBM 4.0 GHz POWER6 (8cpus/node)	9.5	0.6	8.8	1.6	0.1	0.4
IBM 1.9 GHz POWER5 p5-575 (8cpus/node)	64.2	30.7	27.6	1.7	0.6	1.1
IBM 1.5 GHz POWER4 (8cpus/node)	104.5	48.6	47.2	2.1	1.0	1.5
INTEL 2.4 GHz Xeon (2 cpus/node)	54.9	1.5	20.8	1.6	0.7	0.9
INTEL 1.4 GHz Itanium2 (4 cpus/node)	54.5	1.1	22.2	2.0	1.2	0.6

测试代码：


==============================================================================
C Code for fork() creation test
==============================================================================
#include <stdio.h>
#include <stdlib.h>
#define NFORKS 50000

void do_nothing() {
int i;
i= 0;
}

int main(int argc, char *argv[]) {
int pid, j, status;

for (j=0; j<NFORKS; j++) {

/*** error handling ***/
if ((pid = fork()) < 0 ) {
    printf ("fork failed with error code= %d\n", pid);
    exit(0);
    }

/*** this is the child of the fork ***/
else if (pid ==0) {
    do_nothing();
    exit(0);
    }

/*** this is the parent of the fork ***/
else {
    waitpid(pid, status, 0);
    }
  }
}  

==============================================================================
C Code for pthread_create() test
==============================================================================
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>

#define NTHREADS 50000

void *do_nothing(void *null) {
int i;
i=0;
pthread_exit(NULL);
}                      

int main(int argc, char *argv[]) {
int rc, i, j, detachstate;
pthread_t tid;
pthread_attr_t attr;

pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);

for (j=0; j<NTHREADS; j++) {
  rc = pthread_create(&tid, &attr, do_nothing, NULL);
if (rc) {              
    printf("ERROR; return code from pthread_create() is %d\n", rc);
    exit(-1);
    }

/* Wait for the thread */
  rc = pthread_join(tid, NULL);
if (rc) {
    printf("ERROR; return code from pthread_join() is %d\n", rc);
    exit(-1);
    }
  }

pthread_attr_destroy(&attr);
pthread_exit(NULL);

}

进程内的所有线程共享相同的地址空间，在许多案例中跨线程通讯比跨进程更为有效率，应用也更为简单；
在许多方面，线程化的应用程序比未使用线程的程序提供了有更高的性能和实用性：

CPU与I/O的重叠协作：例如，一个程序可能分多段对I/O进行长操作，当一个线程正在等待一个I/O系统调用完成时，CPU可以用其它线程进行现有的密集工作；

优先级/实时调度：可预定更重要的任务取代或者中断低优先级的任务；

异步事件处理：一些不确定次数和持续时间的服务事件是交叉执行的任务。例如，Web服务器可以在应答前一个传输数据的请求时候，处理新的数据请求。

在SMP架构上使用Pthreads最主要的目的就是获得最佳的执行性能。特别是，如果程序使用MPI做on-node通讯，那么使用Pthreads代替on-node数据传输，会获得很大的执行效率的提升。

MPI库经常通过共享内存来实现on-node任务通讯，这样，就必须至少调用一次内存拷贝操作（进程对进程的）；

对于Pthreads，不存在中间（intermediate）的内存拷贝，因为在一个进程中，线程共享相同的地址空间。本身也没有数据传输，它成为了一个从高速缓存到CPU或是内存到CPU带宽式的传输情况，它们的速度更快；

以下是比较列表：

Platform MPI Shared Memory Bandwidth
(GB/sec) Pthreads Worst Case
Memory-to-CPU Bandwidth
(GB/sec)

AMD 2.3 GHz Opteron 1.8 5.3

AMD 2.4 GHz Opteron 1.2 5.3

IBM 1.9 GHz POWER5 p5-575 4.1 16

IBM 1.5 GHz POWER4 2.1 4

Intel 2.4 GHz Xeon 0.3 4.3

Intel 1.4 GHz Itanium 2 1.8 6.4

Platform	MPI Shared Memory Bandwidth (GB/sec)	Pthreads Worst Case Memory-to-CPU Bandwidth (GB/sec)
AMD 2.3 GHz Opteron	1.8	5.3
AMD 2.4 GHz Opteron	1.2	5.3
IBM 1.9 GHz POWER5 p5-575	4.1	16
IBM 1.5 GHz POWER4	2.1	4
Intel 2.4 GHz Xeon	0.3	4.3
Intel 1.4 GHz Itanium 2	1.8	6.4

附注：

SMP——Symmetrical Multi-Processing，对称多处理系统；

MPI——Message Passing Interface，参照http://www-unix.mcs.anl.gov/mpi/mpich/。