HTTP服务器的本质:tinyhttpd源码分析及拓展

一.http请求

http请求由三部分组成，分别是：起始行、消息报头、请求正文

1
2
3
4
5
6


Request Line<CRLF>

Header-Name: header-value<CRLF>

Header-Name: header-value<CRLF>

//一个或多个，均以<CRLF>结尾

<CRLF>
body//请求正文

1、起始行以一个方法符号开头，以空格分开，后面跟着请求的URI和协议的版本，格式如下：

1

Method Request-URI HTTP-Version CRLF

其中 Method表示请求方法；Request-URI是一个统一资源标识符；HTTP-Version表示请求的HTTP协议版本；CRLF表示回车和换行（除了作为结尾的CRLF外，不允许出现单独的CR或LF字符）。

2、请求方法（所有方法全为大写）有多种，各个方法的解释如下：

GET 请求获取Request-URI所标识的资源
POST 在Request-URI所标识的资源后附加新的数据
HEAD 请求获取由Request-URI所标识的资源的响应消息报头
PUT 请求服务器存储一个资源，并用Request-URI作为其标识
DELETE 请求服务器删除Request-URI所标识的资源
TRACE 请求服务器回送收到的请求信息，主要用于测试或诊断
CONNECT 保留将来使用
OPTIONS 请求查询服务器的性能，或者查询与资源相关的选项和需求

应用举例：
GET方法：在浏览器的地址栏中输入网址的方式访问网页时，浏览器采用GET方法向服务器获取资源，eg:

1

GET /form.html HTTP/1.1 (CRLF)

POST方法要求被请求服务器接受附在请求后面的数据，常用于提交表单。eg：

1
2
3
4
5
6
7
8
9

POST /reg.jsp HTTP/ (CRLF)
Accept:image/gif,image/x-xbit,... (CRLF)
...
HOST:www.guet.edu.cn (CRLF)
Content-Length:22 (CRLF)
Connection:Keep-Alive (CRLF)
Cache-Control:no-cache (CRLF)
(CRLF)         //该CRLF表示消息报头已经结束，在此之前为消息报头
user=jeffrey&pwd=1234  //此行以下为提交的数据

二.tinyhttpd源码分析　

　tinyhttpd总共包含以下函数：

1
2
3
4
5
6
7
8
9
10
11
12


void accept_request(int);//处理从套接字上监听到的一个 HTTP 请求

void bad_request(int);//返回给客户端这是个错误请求，400响应码

void cat(int, FILE *);//读取服务器上某个文件写到 socket 套接字

void cannot_execute(int);//处理发生在执行 cgi 程序时出现的错误

void error_die(const char *);//把错误信息写到 perror 

void execute_cgi(int, const char *, const char *, const char *);//运行cgi脚本，这个非常重要，涉及动态解析

int get_line(int, char *, int);//读取一行HTTP报文

void headers(int, const char *);//返回HTTP响应头

void not_found(int);//返回找不到请求文件

void serve_file(int, const char *);//调用 cat 把服务器文件内容返回给浏览器。

int startup(u_short *);//开启http服务，包括绑定端口，监听，开启线程处理链接

void unimplemented(int);//返回给浏览器表明收到的 HTTP 请求所用的 method 不被支持。

建议源码阅读顺序： main -> startup -> accept_request -> execute_cgi　

按照以上顺序，看一下浏览器和tinyhttpd交互的整个流程：

HTTP服务器的本质:tinyhttpd源码分析及拓展

三.注释版源码

　　注释版源码已经放到github上了，以后所有的源码分析都会上传github上。由于tinyhttpd源码较少，下面将完整的代码贴出来。

　　不过这个项目并不能直接在Linux上编译运行。它本来是在solaris上实现的，貌似在socket和pthread的实现上和一般的Linux还是不一样的，需要修改一部分内容。至于如何修改大家参考这篇文章，我也将修改版上传到github上了，名称为tinyhttpd-0.1.0_for_linux，大家可以clone下来，直接make编译即可。下面演示一下如何运行tinyhttpd,编译完成的效果如下：

HTTP服务器的本质:tinyhttpd源码分析及拓展

下面运行./httpd，并在浏览器中访问。

HTTP服务器的本质:tinyhttpd源码分析及拓展

tinyhttpd默认cgi脚本是perl脚本，比如color.cgi，位于htdocs目录下。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

#!/usr/bin/perl -Tw
 

use strict;

use CGI;
 

my($cgi) = new CGI;
 

print $cgi->header;

my($color) = "blue";

$color = $cgi->param('color') if defined $cgi->param('color');
 

print $cgi->start_html(-title => uc($color),

                       -BGCOLOR => $color);

print $cgi->h1("This is $color");

print $cgi->end_html;

　下面我想用python来实现cgi脚本，添加一些页面，为了更加了解cgi程序的运行实质，不用python封装好的cgi模块，完全手工打造。首先在htdocs目录下添加一个register.html页面，html文档内容如下：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40


<html>

    <head>

        <title>注册信息</title>

        <meta charset="utf-8">

    </head>

    <body>

        <form action="register.cgi" method="POST">

            账号：<input type="text" name="zhanghao" value="" size="10" maxlength="5">

            <br>

            <br>

            密码：<input type="password" value="" name="mima" size="10">

            <br>

            <br>

            <input type="hidden" value="隐藏的内容" name="mihiddenma" size="10">

             

            爱好：<input type="checkbox" name="tiyu" checked="checked">体育<input type="checkbox" name="changge">唱歌

            <br>

            <br>

            性别：<input type="radio" name="sex" checked="checked">男<input type="radio" name="sex">女

            <br>

            <br>

            自我介绍：<br>

            <textarea cols="35" rows="10" name="ziwojieshao">

                这里是自我介绍

            </textarea>

            <br>

            <br>

            地址：

            <select name="dizhi">

                <option value="sichuan">四川</option>

                <option value="beijing">北京</option>

                <option value="shanghai">上海</option>

            </select>

            <br>

            <br>

            <input type="submit" value="提交">

            <input type="reset" value="重置">

        </form>

    </body>

</html>

　　这是一个表单，action指向register.cgi,method为post。下面看一下register.cgi,其实是个python脚本。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

#!/usr/bin/python
#coding:utf-8

import sys,os

length = os.getenv('CONTENT_LENGTH')
 

if length:

    postdata = sys.stdin.read(int(length))

    print "Content-type:text/html\n"

    print '<html>'

    print '<head>'

    print '<title>POST</title>'

    print '</head>'

    print '<body>'

    print '<h2> POST data </h2>'

    print '<ul>'

    for data in postdata.split('&'):

        print  '<li>'+data+'</li>'

    print '</ul>'

    print '</body>'

    print '</html>'

     

else:

    print "Content-type:text/html\n"

    print 'no found'

　　代码的意思是从标准输入中读取post中的数据，并将显示数据输出到标准输出中，对比一下流程图，更好理解。下面看一下运行效果。

HTTP服务器的本质:tinyhttpd源码分析及拓展


今天的分享就到这里，下一篇继续分析。如果大家觉得还可以呀，记得推荐呦。

HTTP服务器的本质:tinyhttpd源码分析及拓展

参考文章：HTTP协议全览，tinyhttpd在Linux编译

来自:七夜的故事 http://www.cnblogs.com/qiyeboy/