1 Slowfast模型,一共有两个通道,一个Slow pathway,一个Fast pathway,其中Slow pathway采用低采样,高通道数主要提取空时特征;Fast pathway用高时间采样,低通道数量(主要为了降低计算量),来提取时域特征,两个通道都是以3Dresnet作为backbone,提取特征的,基础网络图如下:
2 对于两个通道都没有采用temporal downsampling,假设Slow pathway里面feature shape是,Fast pathway对应的feature shape。其中对于Slow pathway,只在res4和res5采用non-degenerate temporal convolutions (temporal kernel size > 1),即311的kernel size,因为发现在前面2个res采用会造成准确率下降,可能原因是We argue that this is because when objects move fast and the temporal stride is large, there is little correlation within a temporal receptive field unless the spatial receptive field is large enough(i.e., in later layers)。对于Fast pathway,全部采用non-degenerate temporal convolutions,因为pathway holds fine temporal resolution for the temporal convolutions to capture detailed motion。
3 横向连接,就是将Fast pathway的特征连接到Slow pathway特征上,论文一共提供了三种方法:
(i) Time-to-channel: 直接将reshape成
(ii) Time-strided sampling: 相当于降采样,对没帧,提取一帧,变成
(iii) Time-strided convolution: 通过3D卷积,一个的kernel和的channel,以及对应的stride=,padding=2
最终两个特征通过求和或者连接形式保留,通过最终实验得出Time-strided convolution以及连接形式效果最好,作为默认设置
4 对于其中超参数,论文设置=8,对于,设置从1/32到1/4,最终取1/8
5 对于Fast pathway,采用Weaker spatial inputs,一共设置了half
spatial resolution,gray-scale,“time difference" frames,optical flow作为输入,发现灰度图和RGB图效果接近
相关文章: