视频精彩片段提取 - 调研

思路1：从字幕或音轨中找到对话较多的部分

- 抽取音轨

ffmpeg -i a.mp4 -map 0:a:0 a.mp3

- 逐帧抽取RMS功率：

ffmpeg -i in.mp3 -af astats=metadata=1:reset=1,ametadata=print:key=lavfi.astats.Overall.RMS_level:file=log.txt -f null -

Determining audio level peaks with ffmpeg

https://superuser.com/questions/1183663/determining-audio-level-peaks-with-ffmpeg

- 对整体进行音量分析：

ffmpeg -i input.wav -filter:a volumedetect -f null /dev/null

https://trac.ffmpeg.org/wiki/AudioVolume

https://ffmpeg.org/ffmpeg-filters.html#volumedetect

- 截取片段：

ffmpeg -ss $ss -t 00:05:00 -i $vfile.mp4 -vcodec copy -acodec copy -y $vfile.${ss//:/_}.mp4

https://stackoverflow.com/questions/21420296/how-to-extract-time-accurate-video-segments-with-ffmpeg

提取精彩片段时间区间：

import sys, os

def getv(rms):
    return max(0, 100-abs(rms))

def extract(diff):
    pos=0
    pos3 = 0
    for n, v in enumerate(diff):
        if v > 0:
            pos += 1
        if n < 3 and v >= 3:
            pos3 += 1
    if pos >= 3 and pos3 >= 2:
        return 1
    return 0

timebin = 0
s = []
v = []
diff = (0,)*5
for nline, line in enumerate(sys.stdin):
    if \'pts_time\' in line:
        ts = float(line.split(\'pts_time:\')[1])
        if ts > timebin + 60:
            if s:
                avgrms = int(sum(s)/len(s))
            #    print \'%.2d %.2d\' % (timebin/60, timebin%60), avgrms, 100-abs(avgrms), \'-\' * (100-abs(avgrms))
            if v:
                d = max(0, getv(avgrms)-v[-1])
                diff = diff[1:] + (d,)
                ext = extract(diff)
                print >>sys.stderr, \'%3d %2d %s %3d\' % (timebin/60, timebin%60, avgrms, getv(avgrms)-v[-1]), \'-\' * d, \'*\' * ext
                if ext:
                    h = timebin/3600
                    print \'%.2d:%.2d:00\' % (h, (timebin-3600*h)/60)
                if ext:
                    diff = (0,)*5
            v.append(getv(avgrms))
            timebin += 60
            s=[]
    if \'RMS\' in line:
        rms = float(line.split(\'lavfi.astats.Overall.RMS_level=\')[1])
        if rms > -1000:
            s.append(rms)

调试：

ffmpeg volumedetect returns unstable result

https://stackoverflow.com/questions/48673923/ffmpeg-volumedetect-returns-unstable-result

思路2：思路1+镜头边缘检测

安装opencv：https://www.cnblogs.com/yaoyaohust/p/10228888.html

镜头边界检测：https://www.cnblogs.com/lynsyklate/p/7840881.html

Yahoo的开源工具Hecate：https://github.com/yahoo/hecate

思路3：耗时更长、技术难度更高的做法

百度BROAD-Video Highlights视频精彩片段数据集简要介绍与分析

https://zhuanlan.zhihu.com/p/31770408

Temporal Action Detection (时序动作检测)方向2017年会议论文整理

https://zhuanlan.zhihu.com/p/31501316