【问题标题】:Stripping headers response - Python剥离标头响应 - Python
【发布时间】:2016-03-03 07:07:22
【问题描述】:

典型的HTTP 1.0 标头如下所示:

Server: nginx/1.6.2 (Ubuntu)
Date: Thu, 03 Mar 2016 07:00:00 GMT
Content-Type: text/html
Content-Length: 13471
Last-Modified: Sat, 19 Dec 2015 02:42:32 GMT
Connection: close
ETag: "5674c418-349f"
Cache-Control: no-store
Accept-Ranges: bytes

<!doctype html> // or <!DOCTYPE html>
# remaining of the page content here.

对我来说,将页面开头(由&lt;!doctype html&gt;&lt;!DOCTYPE html&gt;HTTP 请求的标头标记)分开的最简单方法是什么?例如

response = get_response() # get response is a string containing the page.
tokens = response.split("<!doctype html>") # won't work well.
return ''.join(tokens)

效果不好。我正在寻找一种在前半部分(标题响应)和后半部分(正文)之间进行拆分的方法

【问题讨论】:

    标签: python html http parsing


    【解决方案1】:

    您可以将find() 与响应的小写版本一起使用,如下所示:

    response = """
    Server: nginx/1.6.2 (Ubuntu)
    Date: Thu, 03 Mar 2016 07:00:00 GMT
    Content-Type: text/html
    Content-Length: 13471
    Last-Modified: Sat, 19 Dec 2015 02:42:32 GMT
    Connection: close
    ETag: "5674c418-349f"
    Cache-Control: no-store
    Accept-Ranges: bytes
    
    <!doctype html> // or <!DOCTYPE html>
    # remaining of the page content here.
    """
    
    print response[response.lower().find('<!doctype html>'):]
    

    这将打印:

    <!doctype html> // or <!DOCTYPE html>
    # remaining of the page content here.
    

    或者也许只是搜索&lt;!doctype

    【讨论】:

      猜你喜欢
      • 2011-04-09
      • 2023-03-25
      • 2010-09-26
      • 2023-02-14
      • 2014-12-20
      • 1970-01-01
      • 2023-04-04
      • 2014-10-09
      • 2017-03-04
      相关资源
      最近更新 更多