【发布时间】:2015-05-29 18:30:44
【问题描述】:
我有以下 curl 命令:
curl -H "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0" -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" -H "Connection: keep-alive" -X GET http://example.com/en/number/111555000
不幸的是,我无法复制它... 我试过了:
url = http://example.com/en/number/111555000
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0', 'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Connection':'keep-alive',}
req = urllib2.Request(url, None, headers)
resp = urllib2.urlopen(req)
print resp.read()
但是服务器识别出一些请求是“假的”并将我转发给谷歌(来自服务器的回复是:HTTP/1.1 301 Moved Permanently)。使用 curl 代替我收到原始页面。
有什么想法或建议吗? 谢谢 dk
编辑:一些附加信息:
$ nc example.com 80
GET /en/number/111555000 HTTP/1.1
Host: example.com
HTTP/1.1 301 Moved Permanently
Date: Fri, 29 May 2015 18:51:05 GMT
Server: Apache
X-Powered-By: PHP/5.5.24
Location: http://www.google.de
Content-Length: 0
Content-Type: text/html
$ nc example.com 80
GET /en/number/111555000 HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Connection: keep-alive
HTTP/1.1 200 OK
Date: Fri, 29 May 2015 18:57:56 GMT
Server: Apache
X-Powered-By: PHP/5.5.24
Set-Cookie: session=a%3A4%3A%7Bs...
Set-Cookie: session=a%3A4%3A%7Bs...
Keep-Alive: timeout=2, max=200
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8
1c6f8
<!DOCTYPE html>
[...]
卷曲:
$curl -X GET http://example.com/en/number/111555000
$
$ curl -H "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0" -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" -H "Connection: keep-alive" -X GET http://example.com/en/number/111555000
<!DOCTYPE html>
[...]
【问题讨论】:
-
如果使用不带标题的 curl 会发生什么?你确定服务器接受这样的标头吗?
-
什么都没有发生,没有答案:
$ curl -X GET http://example.com/en/number/111555000$
标签: python curl http-headers urllib2 http-get