Video Architecture Search

2019-10-20 06:48:26

 

This blog is from: https://ai.googleblog.com/2019/10/video-architecture-search.html 

 

Video Architecture Search
Examples of various EvaNet architectures. Each colored box (large or small) represents a layer with the color of the box indicating its type: 3D conv. (blue), (2+1)D conv. (orange), iTGM (green), max pooling (grey), averaging (purple), and 1x1 conv. (pink). Layers are often grouped to form modules (large boxes). Digits within each box indicate the filter size.

Video Architecture Search
The representative AssembleNet model evolved using the Moments-in-Time dataset. A node corresponds to a block of spatio-temporal convolutional layers, and each edge specifies their connectivity. Darker edges mean stronger connections. AssembleNet is a family of learnable multi-stream architectures, optimized for the target task.
Video Architecture Search
A figure comparing AssembleNet with state-of-the-art, hand-designed models on Charades (left) and Moments-in-Time (right) datasets. AssembleNet-50 or AssembleNet-101 has an equivalent number of parameters to a two-stream ResNet-50 or ResNet-101.

Video Architecture Search
TinyVideoNet (TVN) architectures evolved to maximize the recognition performance while keeping its computation time within the desired limit. For instance, TVN-1 (top) runs at 37 ms on a CPU and 10ms on a GPU. TVN-2 (bottom) runs at 65ms on a CPU and 13ms on a GPU.
Video Architecture Search
CPU runtime of TinyVideoNet models compared to prior models (left) and runtime vs. model accuracy of TinyVideoNets compared to (2+1)D ResNet models (right). Note that TinyVideoNets take a part of this time-accuracy space where no other models exist, i.e., extremely fast but still accurate.

相关文章:

  • 2022-03-07
  • 2021-12-12
  • 2021-09-21
  • 2022-03-06
  • 2021-10-11
  • 2021-12-31
  • 2021-08-01
  • 2021-08-08
猜你喜欢
  • 2021-12-25
  • 2021-05-15
  • 2022-12-23
  • 2022-01-28
  • 2021-04-19
  • 2021-11-12
相关资源
相似解决方案