OpenCV：从 VideoCapture 读取帧将视频推进到奇怪的错误位置答案

【问题标题】：OpenCV: reading frames from VideoCapture advances the video to bizarrely wrong locationOpenCV：从 VideoCapture 读取帧将视频推进到奇怪的错误位置
【发布时间】：2017-11-13 07:00:51
【问题描述】：

（我将在此问题符合条件时立即为其提供 500 声望奖励 - 除非该问题已关闭。）

一句话的问题

从VideoCapture 读取帧比预期的要快得多。

说明

我需要在特定时间间隔之间读取和分析 100 fps（根据 cv2 和 VLC 媒体播放器）视频的帧。在下面的最小示例中，我试图读取三分钟视频的前十秒的所有帧。

我正在创建一个cv2.VideoCapture 对象，我从中读取帧，直到达到所需的毫秒位置。在我的实际代码中，每一帧都会被分析，但为了展示错误，这一事实是无关紧要的。

在读取帧后检查 VideoCapture 的当前帧和毫秒位置会产生正确的值，因此 VideoCapture 认为它位于正确的位置 - 但事实并非如此。保存最后一个读取帧的图像表明我的迭代严重超过了目标时间超过两分钟。

更奇怪的是，如果我手动将捕获的毫秒位置VideoCapture.set 设置为 10 秒（读取帧后返回相同的值VideoCapture.get）并保存图像，视频在（几乎) 正确的位置！

演示视频文件

如果您想运行 MCVE，您需要 demo.avi 视频文件。可以HERE下载。

MCVE

这个 MCVE 是经过精心设计和评论的。如果有任何不清楚的地方，请在问题下方发表评论。

如果您使用的是 OpenCV 3，则必须将所有 cv2.cv.CV_ 实例替换为 cv2.。（对我来说，这两个版本都会出现问题。）

import cv2

# set up capture and print properties
print 'cv2 version = {}'.format(cv2.__version__)
cap = cv2.VideoCapture('demo.avi')
fps = cap.get(cv2.cv.CV_CAP_PROP_FPS)
pos_msec = cap.get(cv2.cv.CV_CAP_PROP_POS_MSEC)
pos_frames = cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES)
print ('initial attributes: fps = {}, pos_msec = {}, pos_frames = {}'
      .format(fps, pos_msec, pos_frames))

# get first frame and save as picture
_, frame = cap.read()
cv2.imwrite('first_frame.png', frame)

# advance 10 seconds, that's 100*10 = 1000 frames at 100 fps
for _ in range(1000):
    _, frame = cap.read()
    # in the actual code, the frame is now analyzed

# save a picture of the current frame
cv2.imwrite('after_iteration.png', frame)

# print properties after iteration
pos_msec = cap.get(cv2.cv.CV_CAP_PROP_POS_MSEC)
pos_frames = cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES)
print ('attributes after iteration: pos_msec = {}, pos_frames = {}'
      .format(pos_msec, pos_frames))

# assert that the capture (thinks it) is where it is supposed to be
# (assertions succeed)
assert pos_frames == 1000 + 1 # (+1: iteration started with second frame)
assert pos_msec == 10000 + 10

# manually set the capture to msec position 10010
# note that this should change absolutely nothing in theory
cap.set(cv2.cv.CV_CAP_PROP_POS_MSEC, 10010)

# print properties  again to be extra sure
pos_msec = cap.get(cv2.cv.CV_CAP_PROP_POS_MSEC)
pos_frames = cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES)
print ('attributes after setting msec pos manually: pos_msec = {}, pos_frames = {}'
      .format(pos_msec, pos_frames))

# save a picture of the next frame, should show the same clock as
# previously taken image - but does not
_, frame = cap.read()
cv2.imwrite('after_setting.png', frame)

MCVE 输出

print 语句产生以下输出。

cv2 版本 = 2.4.9.1
初始属性：fps = 100.0, pos_msec = 0.0, pos_frames = 0.0
读后属性：pos_msec = 10010.0, pos_frames = 1001.0
手动设置毫秒 pos 后的属性：pos_msec = 10010.0, pos_frames = 1001.0

如您所见，所有属性都有预期值。

imwrite保存以下图片。

first_frame.png

after_iteration.png

after_setting.png

您可以在第二张图片中看到问题。 9:26:15 的目标（图中的实时时钟）错过了两分钟多。手动设置目标时间（第三张图片）将视频设置到（几乎）正确的位置。

我做错了什么，我该如何解决？

目前尝试过

cv2 2.4.9.1 @ Ubuntu 16.04
cv2 2.4.13@Scientific Linux 7.3（三台电脑）
cv2 3.1.0 @ Scientific Linux 7.3（三台电脑）

使用

创建捕获

cap = cv2.VideoCapture('demo.avi', apiPreference=cv2.CAP_FFMPEG)

和

cap = cv2.VideoCapture('demo.avi', apiPreference=cv2.CAP_GSTREAMER)

在 OpenCV 3 中（版本 2 似乎没有 apiPreference 参数）。使用cv2.CAP_GSTREAMER 需要很长时间（运行 MCVE 大约需要 2-3 分钟），但两种 api-preferences 都会产生相同的错误图像。

当直接使用ffmpeg 读取帧时（归功于this 教程），会生成正确的输出图像。

import numpy as np
import subprocess as sp
import pylab

# video properties
path = './demo.avi'
resolution = (593, 792)
framesize = resolution[0]*resolution[1]*3

# set up pipe
FFMPEG_BIN = "ffmpeg"
command = [FFMPEG_BIN,
           '-i', path,
           '-f', 'image2pipe',
           '-pix_fmt', 'rgb24',
           '-vcodec', 'rawvideo', '-']
pipe = sp.Popen(command, stdout = sp.PIPE, bufsize=10**8)

# read first frame and save as image
raw_image = pipe.stdout.read(framesize)
image = np.fromstring(raw_image, dtype='uint8')
image = image.reshape(resolution[0], resolution[1], 3)
pylab.imshow(image)
pylab.savefig('first_frame_ffmpeg_only.png')
pipe.stdout.flush()

# forward 1000 frames
for _ in range(1000):
    raw_image = pipe.stdout.read(framesize)
    pipe.stdout.flush()

# save frame 1001
image = np.fromstring(raw_image, dtype='uint8')
image = image.reshape(resolution[0], resolution[1], 3)
pylab.imshow(image)
pylab.savefig('frame_1001_ffmpeg_only.png')

pipe.terminate()

这会产生正确的结果！（正确的时间戳 9:26:15）

frame_1001_ffmpeg_only.png：

其他信息

在 cmets 中，我被要求提供我的 cvconfig.h 文件。我似乎只有/opt/opencv/3.1.0/include/opencv2/cvconfig.h 下的 cv2 版本 3.1.0 的这个文件。

HERE 是此文件的粘贴。

如果有帮助，我可以使用VideoCapture.get 提取以下视频信息。

亮度0.0
对比度 0.0
转换_rgb 0.0
曝光 0.0
格式 0.0
四cc 1684633187.0
fps 100.0
帧数 18000.0
框架高度 593.0
frame_width 792.0
增益 0.0
色调 0.0
模式 0.0
openni_baseline 0.0
openni_focal_length 0.0
openni_frame_max_depth 0.0
openni_output_mode 0.0
openni_registration 0.0
pos_avi_ratio 0.01
pos_frames 0.0
pos_msec 0.0
整改0.0
饱和度 0.0

【问题讨论】：

什么平台？为什么要使用这么旧版本的 OpenCV？这太奇怪了，我根本不希望文件中的read() 跳过帧......
顺便说一句，我已经在 Win10 上使用 Python 2.7.5 和 OpenCV 2.4.11 进行了本地尝试，两个 after_ 图像都显示了 9:26:15 的时间戳。有趣的是，即使您的after_setting.png 也比所需时间晚了 2 秒。这可能是 OpenCV 中的错误，或者更有可能是您使用的任何库来解码 AVI 文件...
@timgeb 以下链接可能会有所帮助：1、2、3。具体来说，1 表示“签入 opencv2/cvconfig.h 以了解当前可用的 API（例如 HAVE_MSMF、HAVE_VFW、HAVE_LIBV4L 等...）。”
@timgeb 你用ffmpeg 库编译OpenCV 了吗？您可以尝试直接使用ffmpeg (tutorial here) 读取帧以查看是否有类似的跳过，并且您可以指定使用 OpenCV 中的ffmpeg 库为VideoCapture 和cv2.VideoCapture('demo.avi', apiPreference=cv2.CAP_FFMPEG)。试一试并报告！
@timgeb 我可能会尝试重新编译 OpenCV 并确保使用它构建 ffmpeg。除此之外，我在这里没有我的专业知识。希望你能得到一个可行的解决方案，这绝对是一个奇怪的问题。

标签： python opencv ubuntu video video-processing

【解决方案1】：

您的视频文件数据仅包含 1313 个非重复帧（即持续时间在每秒 7 到 8 帧之间）：

$ ffprobe -i demo.avi -loglevel fatal -show_streams -count_frames|grep frame
has_b_frames=0
r_frame_rate=100/1
avg_frame_rate=100/1
nb_frames=18000
nb_read_frames=1313        # !!!

使用ffmpeg 转换 avi 文件会报告 16697 个重复帧（由于某种原因，添加了 10 个额外的帧并且 16697=18010-1313）。

$ ffmpeg -i demo.avi demo.mp4
...
frame=18010 fps=417 Lsize=3705kB time=03:00.08 bitrate=168.6kbits/s dup=16697
#                                                                   ^^^^^^^^^
...

顺便说一句，因此转换后的视频 (demo.mp4) 没有问题讨论过，即 OpenCV 正确处理它。

在这种情况下，重复帧实际上并不存在于 avi 文件中，而是每个重复帧由重复前一帧的指令表示。这可以检查如下：

$ ffplay -loglevel trace demo.avi
...
[ffplay_crop @ 0x7f4308003380] n:16 t:2.180000 pos:1311818.000000 x:0 y:0 x+w:792 y+h:592
[avi @ 0x7f4310009280] dts:574 offset:574 1/100 smpl_siz:0 base:1000000 st:0 size:81266
video: delay=0.130 A-V=0.000094
    Last message repeated 9 times
video: delay=0.130 A-V=0.000095
video: delay=0.130 A-V=0.000094
video: delay=0.130 A-V=0.000095
[avi @ 0x7f4310009280] dts:587 offset:587 1/100 smpl_siz:0 base:1000000 st:0 size:81646
[ffplay_crop @ 0x7f4308003380] n:17 t:2.320000 pos:1393538.000000 x:0 y:0 x+w:792 y+h:592
video: delay=0.140 A-V=0.000091
    Last message repeated 4 times
video: delay=0.140 A-V=0.000092
    Last message repeated 1 times
video: delay=0.140 A-V=0.000091
    Last message repeated 6 times
...

在上述日志中，具有实际数据的帧由以“[avi @ 0xHHHHHHHHHHH]”开头的行表示。 “video: delay=xxxxx A-V=yyyyy”消息表明最后一帧必须显示xxxxx 更多秒。

cv2.VideoCapture() 跳过此类重复帧，仅读取具有真实数据的帧。这是对应的（虽然稍作编辑）code from the 2.4 branch of opencv（注意，顺便说一句，在 ffmpeg 下使用了，我通过在 gdb 下运行 python 并在CvCapture_FFMPEG::grabFrame 上设置断点来验证）：

bool CvCapture_FFMPEG::grabFrame()
{
    ...
    int count_errs = 0;
    const int max_number_of_attempts = 1 << 9; // !!!
    ...
    // get the next frame
    while (!valid)
    {
        ...
        int ret = av_read_frame(ic, &packet);
        ...        
        // Decode video frame
        avcodec_decode_video2(video_st->codec, picture, &got_picture, &packet);
        // Did we get a video frame?
        if(got_picture)
        {
            //picture_pts = picture->best_effort_timestamp;
            if( picture_pts == AV_NOPTS_VALUE_ )
                picture_pts = packet.pts != AV_NOPTS_VALUE_ && packet.pts != 0 ? packet.pts : packet.dts;
            frame_number++;
            valid = true;
        }
        else
        {
            // So, if the next frame doesn't have picture data but is
            // merely a tiny instruction telling to repeat the previous
            // frame, then we get here, treat that situation as an error
            // and proceed unless the count of errors exceeds 1 billion!!!
            if (++count_errs > max_number_of_attempts)
                break;
        }
    }
    ...
}

【讨论】：

很有趣，显然ffmpeg 返回了重复的帧，但VideoCapture 忽略了它们。您是否知道如何强制VideoCapture 复制丢失的帧或如何获得fps 和/或pos_msec 的正确值，以便在我的实际程序中迭代到特定的pos_msec 不会推进视频太远？在 MCVE 之外，我需要将视频推进到特定位置，我通过检查 pos_msec 来做到这一点。
（我很犹豫是否过早地接受你的回答，因为它包含了我很感激的问题的根源，但还没有解决）
@timgeb 没关系，别着急。我计划用如何处理这种情况的方法来详细说明答案。
作为旁注，很奇怪@DanMašek 无法在 Windows 10 下重现该问题，显然VideoCapture 对此处的重复帧有不同的政策？！
@timgeb 我在 OpenCV 中找到了可以归咎于这个问题的代码，但不幸的是，除了以某种方式通过ffmpeg 传递输入文件之外，似乎没有任何解决方法，所以它可以实现所有重复的帧或丢弃它们并在剩余的帧上放置正确的时间戳。

【解决方案2】：

简而言之：我在使用 OpenCV 2.4.13 的 Ubuntu 12.04 机器上重现了您的问题，注意到您的视频中使用的编解码器 (FourCC CVID) 似乎相当旧（根据 2011 年的 post），在将视频转换为编解码器 MJPG（又名 M-JPEG 或 Motion JPEG）之后，您的 MCVE 就可以工作了。当然，Leon（或其他人）可能会发布 OpenCV 修复程序，这可能是您的情况的更好解决方案。

我最初尝试使用转换

ffmpeg -i demo.avi -vcodec mjpeg -an demo_mjpg.avi

和

avconv -i demo.avi -vcodec mjpeg -an demo_mjpg.avi

（都在 16.04 盒子上）。有趣的是，两者都制作了“破碎”的视频。例如，当使用 Avidemux 跳转到第 1000 帧时，没有实时时钟！此外，转换后的视频只有原始大小的 1/6 左右，这很奇怪，因为 M-JPEG 是一种非常简单的压缩方式。（每一帧都是独立的 JPEG 压缩。）

使用 Avidemux 将 demo.avi 转换为 M-JPEG 生成了 MCVE 工作的视频。（我使用 Avidemux GUI 进行转换。）转换后的视频大小约为原始大小的 3 倍。当然，也可以使用 Linux 上支持更好的编解码器进行原始录制。如果您需要在应用程序中跳转到视频中的特定帧，M-JPEG 可能是最佳选择。否则，H.264 压缩得更好。根据我的经验，两者都得到了很好的支持，而且我见过的唯一代码直接在网络摄像头上实现（H.264 仅在高端摄像头上）。

【讨论】：

谢谢。因此，如果我理解正确，您与 Avidemux 的“物理”转换会添加当前视频中仅表示为重复指令的帧？你能告诉我如何进行转换吗？最好通过命令行（但 GUI 也可以）。
我现在有一个很好的解决方案（转换视频）和另一个很好的答案来解释问题的根源。不幸的是，我无法分配赏金（请参阅this 元问题）。我等了一会儿，以便您有时间添加缺失的部分。我可以转换我的视频（首选命令行解决方案）或以某种方式打开VideoCapture 输入来自ffmpeg。
我刚刚尝试使用avidemux3_qt4 将视频转换为mjpeg。视频变小了（17 MB），MCVE 中的问题仍然存在。所以我真的需要知道你到底做了什么。再次感谢。
这是我在 12.04 机器上所做的：右键单击 demo.avi，使用 Avidemux (GTK+)（版本 2.5.4）打开，在左侧的“复制”下拉菜单中，选择“M-JPEG”，然后单击保存。我只是重复了一遍，它仍然有效。 :) 看看我是否可以让命令行工作。 Avidemux 工作马马虎虎 w.r.t.以我的经验命令行。
今晚将对此进行更多研究。一种选择是您编写自己的转换器来扩展您的 ffmpeg 代码。 OpenCV 2.4（但不是 3.X）编写了非常好的 M-JPEG；我在工作中经常使用它。

【解决方案3】：

如你所说：

当直接使用 ffmpeg 读取帧时（归功于本教程），会生成正确的输出图像。

这是否正常，因为您定义了一个 framesize = resolution[0]*resolution[1]*3

然后在阅读时重用它： pipe.stdout.read(framesize)

所以我认为你必须更新每个：

_, frame = cap.read()

到

_, frame = cap.read(framesize)

假设分辨率相同，最终代码版本为：

import cv2

# set up capture and print properties
print 'cv2 version = {}'.format(cv2.__version__)
cap = cv2.VideoCapture('demo.avi')
fps = cap.get(cv2.cv.CV_CAP_PROP_FPS)
pos_msec = cap.get(cv2.cv.CV_CAP_PROP_POS_MSEC)
pos_frames = cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES)
print ('initial attributes: fps = {}, pos_msec = {}, pos_frames = {}'
      .format(fps, pos_msec, pos_frames))

resolution = (593, 792) #here resolution 
framesize = resolution[0]*resolution[1]*3 #here framesize

# get first frame and save as picture
_, frame = cap.read( framesize ) #update to get one frame
cv2.imwrite('first_frame.png', frame)

# advance 10 seconds, that's 100*10 = 1000 frames at 100 fps
for _ in range(1000):
    _, frame = cap.read( framesize ) #update to get one frame
    # in the actual code, the frame is now analyzed

# save a picture of the current frame
cv2.imwrite('after_iteration.png', frame)

# print properties after iteration
pos_msec = cap.get(cv2.cv.CV_CAP_PROP_POS_MSEC)
pos_frames = cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES)
print ('attributes after iteration: pos_msec = {}, pos_frames = {}'
      .format(pos_msec, pos_frames))

# assert that the capture (thinks it) is where it is supposed to be
# (assertions succeed)
assert pos_frames == 1000 + 1 # (+1: iteration started with second frame)
assert pos_msec == 10000 + 10

# manually set the capture to msec position 10010
# note that this should change absolutely nothing in theory
cap.set(cv2.cv.CV_CAP_PROP_POS_MSEC, 10010)

# print properties  again to be extra sure
pos_msec = cap.get(cv2.cv.CV_CAP_PROP_POS_MSEC)
pos_frames = cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES)
print ('attributes after setting msec pos manually: pos_msec = {}, pos_frames = {}'
      .format(pos_msec, pos_frames))

# save a picture of the next frame, should show the same clock as
# previously taken image - but does not
_, frame = cap.read()
cv2.imwrite('after_setting.png', frame)

【讨论】：

你的“解决方案”不会改变任何东西。
我试过你的解决方案，输出图像和以前一样。