剥离标头响应 - Python答案

【问题标题】：Stripping headers response - Python剥离标头响应 - Python
【发布时间】：2016-03-03 07:07:22
【问题描述】：

典型的HTTP 1.0 标头如下所示：

Server: nginx/1.6.2 (Ubuntu)
Date: Thu, 03 Mar 2016 07:00:00 GMT
Content-Type: text/html
Content-Length: 13471
Last-Modified: Sat, 19 Dec 2015 02:42:32 GMT
Connection: close
ETag: "5674c418-349f"
Cache-Control: no-store
Accept-Ranges: bytes

<!doctype html> // or <!DOCTYPE html>
# remaining of the page content here.

对我来说，将页面开头（由<!doctype html> 或<!DOCTYPE html> 与HTTP 请求的标头标记）分开的最简单方法是什么？例如

response = get_response() # get response is a string containing the page.
tokens = response.split("<!doctype html>") # won't work well.
return ''.join(tokens)

效果不好。我正在寻找一种在前半部分（标题响应）和后半部分（正文）之间进行拆分的方法

【问题讨论】：

标签： python html http parsing

【解决方案1】：

您可以将find() 与响应的小写版本一起使用，如下所示：

response = """
Server: nginx/1.6.2 (Ubuntu)
Date: Thu, 03 Mar 2016 07:00:00 GMT
Content-Type: text/html
Content-Length: 13471
Last-Modified: Sat, 19 Dec 2015 02:42:32 GMT
Connection: close
ETag: "5674c418-349f"
Cache-Control: no-store
Accept-Ranges: bytes

<!doctype html> // or <!DOCTYPE html>
# remaining of the page content here.
"""

print response[response.lower().find('<!doctype html>'):]

这将打印：

<!doctype html> // or <!DOCTYPE html>
# remaining of the page content here.

或者也许只是搜索<!doctype

【讨论】：