【问题标题】:Flask app is keep on loading at the time of prediction(TensorRT)Flask 应用程序在预测时继续加载(TensorRT)
【发布时间】:2020-07-28 02:04:11
【问题描述】:

这是问题的延续

Facing issue while running Flask app with TensorRt model on jetson nano

以上已解决,但是当我运行烧瓶“应用程序”时,它会继续加载并且不显示视频。

代码:

def callback(): 
 cuda.init() 
 device = cuda.Device(0) 
 ctx = device.make_context() 
 onnx_model_path = './some.onnx' 
 fp16_mode = False
 int8_mode = False 
 trt_engine_path = './model_fp16_{}_int8_{}.trt'.format(fp16_mode, int8_mode)
 max_batch_size = 1 
 engine = get_engine(max_batch_size, onnx_model_path, trt_engine_path, fp16_mode, int8_mode) 
 context = engine.create_execution_context() 
 inputs, outputs, bindings, stream = allocate_buffers(engine) 
 ctx.pop()

##callback function ends


worker_thread = threading.Thread(target=callback())
worker_thread.start()

trt_outputs = do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)

def do_inference(context, bindings, inputs, outputs, stream, batch_size=1):
 print("start in do_inferece")
 # Transfer data from CPU to the GPU.
 [cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
 # Run inference.
 print("before run infernce in do_inferece")
 context.execute_async(batch_size=batch_size, bindings=bindings, stream_handle=stream.handle)
 # Transfer predictions back from the GPU.
 print("before output in do_inferece")
 [cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
 print("before stream synchronize in do_inferece")
 # Synchronize the stream
 stream.synchronize()
 # Return only the host outputs.
 print("before return in do_inferece")
 return [out.host for out in outputs]

【问题讨论】:

  • 为什么不在回调中执行推理?

标签: flask tensorrt nvidia-jetson nvidia-jetson-nano


【解决方案1】:

您的worker_thread 创建do_inference 所需的context。您应该在callback() 中调用do_inference 方法

def callback(): 
   cuda.init() 
   device = cuda.Device(0) 
   ctx = device.make_context() 
   onnx_model_path = './some.onnx' 
   fp16_mode = False
   int8_mode = False 
   trt_engine_path = './model_fp16_{}_int8_{}.trt'.format(fp16_mode, int8_mode)
   max_batch_size = 1 
   engine = get_engine(max_batch_size, onnx_model_path, trt_engine_path, fp16_mode, int8_mode) 
   context = engine.create_execution_context() 
   inputs, outputs, bindings, stream = allocate_buffers(engine) 
   trt_outputs = do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)
   # post-process the trt_outputs
   ctx.pop()

【讨论】:

  • 这不是说我将为每个请求创建一个上下文吗?
猜你喜欢
  • 2019-04-27
  • 1970-01-01
  • 1970-01-01
  • 2020-06-22
  • 2018-03-25
  • 1970-01-01
  • 2015-12-25
  • 2016-07-20
  • 1970-01-01
相关资源
最近更新 更多