Python，流请求期间的捕获超时答案

【问题标题】：Python, Catch timeout during stream requestPython，流请求期间的捕获超时
【发布时间】：2014-01-20 13:31:31
【问题描述】：

我正在使用请求库读取 XML 事件，如下面的代码所述。请求启动后如何引发连接丢失错误？服务器正在模拟 HTTP 推送/长轮询 -> http://en.wikipedia.org/wiki/Push_technology#Long_polling，默认情况下不会结束。如果 10 分钟后没有新消息，则应退出 while 循环。

import requests
from time import time


if __name__ == '__main__':
    #: Set a default content-length
    content_length = 512
    try:
        requests_stream = requests.get('http://agent.mtconnect.org:80/sample?interval=0', stream=True, timeout=2)
        while True:
            start_time = time()
            #: Read three lines to determine the content-length         
            for line in requests_stream.iter_lines(3, decode_unicode=None):
                if line.startswith('Content-length'):
                    content_length = int(''.join(x for x in line if x.isdigit()))
                    #: pause the generator
                    break

            #: Continue the generator and read the exact amount of the body.        
            for xml in requests_stream.iter_content(content_length):
                print "Received XML document with content length of %s in %s seconds" % (len(xml), time() - start_time)
                break

    except requests.exceptions.RequestException as e:
        print('error: ', e)

可以通过命令行使用 curl 测试服务器推送：

curl http://agent.mtconnect.org:80/sample\?interval\=0

【问题讨论】：

标签： python python-requests urllib3

【解决方案1】：

这可能不是最好的方法，但您可以使用多处理在单独的进程中运行请求。这样的事情应该可以工作：

import multiprocessing
import requests
import time

class RequestClient(multiprocessing.Process):
    def run(self):
        # Write all your code to process the requests here
        content_length = 512
        try:
            requests_stream = requests.get('http://agent.mtconnect.org:80/sample?interval=0', stream=True, timeout=2)

            start_time = time.time()
            for line in requests_stream.iter_lines(3, decode_unicode=None):
                if line.startswith('Content-length'):
                    content_length = int(''.join(x for x in line if x.isdigit()))
                    break

            for xml in requests_stream.iter_content(content_length):
                print "Received XML document with content length of %s in %s seconds" % (len(xml), time.time() - start_time) 
                break
        except requests.exceptions.RequestException as e:
            print('error: ', e)


While True:
    childProcess = RequestClient()
    childProcess.start()

    # Wait for 10mins
    start_time = time.time()
    while time.time() - start_time <= 600:
        # Check if the process is still active
        if not childProcess.is_alive():
            # Request completed
            break
        time.sleep(5)    # Give the system some breathing time

    # Check if the process is still active after 10mins.
    if childProcess.is_alive():
        # Shutdown the process
        childProcess.terminate()
        raise RuntimeError("Connection Timed-out")

不是解决您问题的完美代码，但您明白了。

【讨论】：

嗯，看来可以了。但是，我每 5 秒只收到一条 XML 消息。我需要尽快获得这些；）
5秒睡眠实际上并没有挂起子进程。它只是让主线程休眠。 XML 消息应在子进程中返回后立即进行处理。很可能，服务器或 requests 模块正在添加 5 秒延迟。
如果成功了，你可能会继续接受答案:)
成功了，是的。但拥有更多线程和进程并不是理想的解决方案。我认为有一个方法/函数可以在循环中使用。 :)
如果requests_stream.iter_lines是一个阻塞调用（它可能是），那么没有其他方法可以做到这一点，因为在等待数据时不会调用循环中的超时函数。