行为识别TSM训练ucf101数据集

序言

最近有个行为检测的需求，打算用行为识别做，纯小白入这个方向，啃了两周的TSM原理和源码，训练好自己的数据集后，发现好像没法应用到自己的需求场景？？玛德！算了，还是要记录一下。原理就没别要讲了，网上很多，感兴趣的可以自己去搜。

一、数据准备

首先把代码git下来temporal-shift-module，然后作者提供了一个mobilenetv2版本的手势识别在线demo，使用了tvm推理，在Jeston Nano能够达到实时，看着还不错的样子，赶紧试一下，可是我没有nano怎么办？没关系，修改一下。
行为识别TSM训练ucf101数据集
该demo放在online_demo目录中的main.py文件，可是没有nano，又不想安装tvm怎么办？问题不大，修改一下，用pytorch推理！把改下的模型下载下来，README.md中有提供下载链接，推荐使用迅雷下载。在noline_demo下新建一个demo.py文件，将tvm的那部分推理换成pytorch的推理即可，基于main.py修改后的代码如下：

import torch
from online_demo.mobilenet_v2_tsm import MobileNetV2
import cv2
import numpy as np
import torchvision
from PIL import Image
import time

SOFTMAX_THRES = 1
HISTORY_LOGIT = True
REFINE_OUTPUT = True

shift_buffer = [torch.zeros([1, 3, 56, 56]),
                torch.zeros([1, 4, 28, 28]),
                torch.zeros([1, 4, 28, 28]),
                torch.zeros([1, 8, 14, 14]),
                torch.zeros([1, 8, 14, 14]),
                torch.zeros([1, 8, 14, 14]),
                torch.zeros([1, 12, 14, 14]),
                torch.zeros([1, 12, 14, 14]),
                torch.zeros([1, 20, 7, 7]),
                torch.zeros([1, 20, 7, 7])]


class GroupScale(object):
    """ Rescales the input PIL.Image to the given 'size'.
    'size' will be the size of the smaller edge.
    For example, if height > width, then image will be
    rescaled to (size * height / width, size)
    size: size of the smaller edge
    interpolation: Default: PIL.Image.BILINEAR
    """

    def __init__(self, size, interpolation=Image.BILINEAR):
        self.worker = torchvision.transforms.Scale(size, interpolation)

    def __call__(self, img_group):
        return [self.worker(img) for img in img_group]


class GroupCenterCrop(object):
    def __init__(self, size):
        self.worker = torchvision.transforms.CenterCrop(size)

    def __call__(self, img_group):
        return [self.worker(img) for img in img_group]


class Stack(object):

    def __init__(self, roll=False):
        self.roll = roll

    def __call__(self, img_group):
        if img_group[0]