使用Net::HTTP::Head 允许您向服务器询问有关页面的信息,而不必返回该页面并浪费它们以及您的带宽和CPU 时间。返回的标头之一应该是Content-Length:
require 'net/http'
request = Net::HTTP.new('google.com', 80)
head = request.request_head('/')
返回:
#<Net::HTTPMovedPermanently:0x102157ae0
@body_exist = false,
@read = true,
@socket = nil,
attr_accessor :body = nil,
attr_reader :code = "301",
attr_reader :header = {
"location" => [
[0] "http://www.google.com/"
],
"content-type" => [
[0] "text/html; charset=UTF-8"
],
"date" => [
[0] "Thu, 26 Jul 2012 17:46:30 GMT"
],
"expires" => [
[0] "Sat, 25 Aug 2012 17:46:30 GMT"
],
"cache-control" => [
[0] "public, max-age=2592000"
],
"server" => [
[0] "gws"
],
"content-length" => [
[0] "219"
],
"x-xss-protection" => [
[0] "1; mode=block"
],
"x-frame-options" => [
[0] "SAMEORIGIN"
],
"connection" => [
[0] "close"
]
},
attr_reader :http_version = "1.1",
attr_reader :message = "Moved Permanently"
>
这是一个重定向,表明浏览器需要寻找其他地方。
遗憾的是,并非所有 HTTPd 都返回 content-length 标头,因为页面可能是动态创建的,因此在内容实际呈现和发送之前无法做出准确的猜测。
在上述重定向之后,使用另一个 HEAD 请求会导致:
#<Net::HTTPOK:0x10217e8c0
@body_exist = false,
@read = true,
@socket = nil,
attr_accessor :body = nil,
attr_reader :code = "200",
attr_reader :header = {
"set-cookie" => [
[ 0] "NID=62=c2jRl25ItoF5YkVgNv3g2woB2A3iIqkY__EYX5BGst--KYmjNbfCeVL0FIUcq6jm6PqH_-YV6QFO_yNjy1BzMms-QJKPRsfcq0px030WVzKTMtMF9dJUJpS0XdV1NLOv; expires=Fri, 25-Jan-2013 17:50:22 GMT; path=/; domain=.google.com; HttpOnly",
[ 1] "expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com",
[ 2] "path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com",
[ 3] "domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com",
[ 4] "expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com",
[ 5] "path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com",
[ 6] "domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=www.google.com",
[ 7] "expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com",
[ 8] "path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com",
[ 9] "domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com",
[10] "expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com",
[11] "path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com",
[12] "domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.www.google.com",
[13] "expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=google.com",
[14] "path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=google.com",
[15] "domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=google.com",
[16] "expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=google.com",
[17] "path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=google.com",
[18] "domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=google.com",
[19] "expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.google.com",
[20] "path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.google.com",
[21] "domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.google.com",
[22] "expires=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.google.com",
[23] "path=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.google.com",
[24] "domain=; expires=Mon, 01-Jan-1990 00:00:00 GMT; path=/; domain=.google.com",
[25] "PREF=ID=51ce2f15ffbc5de1:FF=0:TM=1343325022:LM=1343325022:S=H8-1NoxuEbX7fepF; expires=Sat, 26-Jul-2014 17:50:22 GMT; path=/; domain=.google.com",
[26] "NID=62=aO6oBKx_v48l5SqQrRDUiNxfOixEE0QnkQIBSZK4u0xS8cHGc7uXTUt6yJhIZTyCe_XWGn6t3-Ov4EvxPE8hAO7I89ao9RR9dLUyYPBB784fR12bJsqbkTaCVaZI7ihT; expires=Fri, 25-Jan-2013 17:50:22 GMT; path=/; domain=.google.com; HttpOnly"
],
"date" => [
[0] "Thu, 26 Jul 2012 17:50:22 GMT"
],
"expires" => [
[0] "-1"
],
"cache-control" => [
[0] "private, max-age=0"
],
"content-type" => [
[0] "text/html; charset=ISO-8859-1"
],
"p3p" => [
[0] "CP=\"This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info.\""
],
"server" => [
[0] "gws"
],
"x-xss-protection" => [
[0] "1; mode=block"
],
"x-frame-options" => [
[0] "SAMEORIGIN"
],
"connection" => [
[0] "close"
]
},
attr_reader :http_version = "1.1",
attr_reader :message = "OK"
>
注意,没有返回 content-length 标头。
访问返回静态页面的网站会给我不同的响应:
request = Net::HTTP.new('tools.ietf.org', 80)
head = request.request_head('/html/rfc2606')
返回:
#<Net::HTTPOK:0x100914370
@body_exist = false,
@read = true,
@socket = nil,
attr_accessor :body = nil,
attr_reader :code = "200",
attr_reader :header = {
"date" => [
[0] "Thu, 26 Jul 2012 17:55:23 GMT"
],
"server" => [
[0] "Apache/2.2.21 (Debian)"
],
"content-location" => [
[0] "rfc2606.html"
],
"vary" => [
[0] "negotiate"
],
"tcn" => [
[0] "choice"
],
"last-modified" => [
[0] "Sat, 26 May 2012 22:18:00 GMT"
],
"etag" => [
[0] "\"d44ff-43da-4c0f7db90d600;4c5bf43471540\""
],
"accept-ranges" => [
[0] "bytes"
],
"content-length" => [
[0] "17370"
],
"connection" => [
[0] "close"
],
"content-type" => [
[0] "text/html; charset=UTF-8"
]
},
attr_reader :http_version = "1.1",
attr_reader :message = "OK"
>
所以,是的,可以判断,但有时您无法从HEAD 请求中获得所需的信息。
过去,我解决这个问题的方法是先尝试 HEAD,如果这不能满足我的需求,那么我会使用普通 GET 检索页面,然后从中获取大小.采用这种方法有助于减少浪费的带宽。