在不调用外部软件的情况下使用 Python 获取视频属性答案

【问题标题】：Getting video properties with Python without calling external software在不调用外部软件的情况下使用 Python 获取视频属性
【发布时间】：2018-05-07 08:23:45
【问题描述】：

[更新：] 是的，有可能，现在大约 20 个月后。请参阅下面的更新 3！ [/更新]

这真的不可能吗？我能找到的只是调用 FFmpeg（或其他软件）的变体。我当前的解决方案如下所示，但我真正想要的可移植性是一个纯 Python 解决方案，不需要用户安装额外的软件。

毕竟，我可以使用 PyQt 的 Phonon 轻松播放视频，但我不能简单地获得视频的尺寸或持续时间等信息？

我的解决方案使用 ffmpy (http://ffmpy.readthedocs.io/en/latest/ffmpy.html)，它是 FFmpeg 和 FFprobe (http://trac.ffmpeg.org/wiki/FFprobeTips) 的包装器。比其他产品更流畅，但仍需要额外安装 FFmpeg。

    import ffmpy, subprocess, json
    ffprobe = ffmpy.FFprobe(global_options="-loglevel quiet -sexagesimal -of json -show_entries stream=width,height,duration -show_entries format=duration -select_streams v:0", inputs={"myvideo.mp4": None})
    print("ffprobe.cmd:", ffprobe.cmd)  # printout the resulting ffprobe shell command
    stdout, stderr = ffprobe.run(stderr=subprocess.PIPE, stdout=subprocess.PIPE)
    # std* is byte sequence, but json in Python 3.5.2 requires str
    ff0string = str(stdout,'utf-8')

    ffinfo = json.loads(ff0string)
    print(json.dumps(ffinfo, indent=4)) # pretty print

    print("Video Dimensions: {}x{}".format(ffinfo["streams"][0]["width"], ffinfo["streams"][0]["height"]))
    print("Streams Duration:", ffinfo["streams"][0]["duration"])
    print("Format Duration: ", ffinfo["format"]["duration"])

输出结果：

    ffprobe.cmd: ffprobe -loglevel quiet -sexagesimal -of json -show_entries stream=width,height,duration -show_entries format=duration -select_streams v:0 -i myvideo.mp4
    {
        "streams": [
            {
                "duration": "0:00:32.033333",
                "width": 1920,
                "height": 1080
            }
        ],
        "programs": [],
        "format": {
            "duration": "0:00:32.064000"
        }
    }
    Video Dimensions: 1920x1080
    Streams Duration: 0:00:32.033333
    Format Duration:  0:00:32.064000

更新经过几天的实验：下面 Nick 提出的 hachoire 解决方案确实有效，但会给您带来很多麻烦，因为 hachoire 的响应太不可预测了。不是我的选择。

使用 opencv 编码再简单不过了：

import cv2
vid = cv2.VideoCapture( picfilename)
height = vid.get(cv2.CAP_PROP_FRAME_HEIGHT) # always 0 in Linux python3
width  = vid.get(cv2.CAP_PROP_FRAME_WIDTH)  # always 0 in Linux python3
print ("opencv: height:{} width:{}".format( height, width))

问题是它在 Python2 上运行良好，但在 Py3 上却不行。引用：“重要提示：MacOS 和 Linux 软件包不支持视频相关功能（未使用 FFmpeg 编译）” (https://pypi.python.org/pypi/opencv-python)。

除此之外，opencv 似乎需要在运行时存在 FFmeg 的二进制包 (https://docs.opencv.org/3.3.1/d0/da7/videoio_overview.html)。

好吧，如果我仍然需要安装 FFmpeg，我可以坚持上面显示的原始 ffmpy 示例：-/

感谢您的帮助。

UPDATE2： master_q（见下文）建议 MediaInfo。虽然这无法在我的 Linux 系统上运行（请参阅我的 cmets），但使用 pymediainfo（MediaInfo 的 py 包装器）的替代方法确实有效。使用简单，但是比我最初的 ffprobe 方法获取时长、宽度和高度的时间要长 4 倍，并且仍然需要外部软件，即 MediaInfo：

from pymediainfo import MediaInfo
media_info = MediaInfo.parse("myvideofile")
for track in media_info.tracks:
    if track.track_type == 'Video':
        print("duration (millisec):", track.duration)
        print("width, height:", track.width, track.height)

UPDATE3： OpenCV 终于可用于 Python3，并声称可以在 Linux、Win 和 Mac 上运行！它真的很容易，而且我证实不需要外部软件 - 特别是 ffmpeg！

首先通过 Pip 安装 OpenCV：

pip install opencv-python

在 Python 中运行：

import cv2
cv2video = cv2.VideoCapture( videofilename)
height = cv2video.get(cv2.CAP_PROP_FRAME_HEIGHT)
width  = cv2video.get(cv2.CAP_PROP_FRAME_WIDTH) 
print ("Video Dimension: height:{} width:{}".format( height, width))

framecount = cv2video.get(cv2.CAP_PROP_FRAME_COUNT ) 
frames_per_sec = cv2video.get(cv2.CAP_PROP_FPS)
print("Video duration (sec):", framecount / frames_per_sec)

# equally easy to get this info from images
cv2image = cv2.imread(imagefilename, flags=cv2.IMREAD_COLOR  )
height, width, channel  = cv2image.shape
print ("Image Dimension: height:{} width:{}".format( height, width))

我还需要视频的第一帧作为图像，并为此使用 ffmpeg 将图像保存在文件系统中。使用 OpenCV 也更容易：

hasFrames, cv2image = cv2video.read()   # reads 1st frame
cv2.imwrite("myfilename.png", cv2image) # extension defines image type

但更好的是，由于我只需要内存中的图像才能在 PyQt5 工具包中使用，我可以直接将 cv2-image 读入 Qt-image：

bytesPerLine = 3 * width
# my_qt_image = QImage(cv2image, width, height, bytesPerLine, QImage.Format_RGB888) # may give false colors!
my_qt_image = QImage(cv2image.data, width, height, bytesPerLine, QImage.Format_RGB888).rgbSwapped() # correct colors on my systems

由于 OpenCV 是一个庞大的程序，我担心时间问题。事实证明，OpenCV 从未落后于替代方案。我需要大约 100 毫秒来阅读一张幻灯片，其余的时间加起来不会超过 10 毫秒。

我在 Ubuntu Mate 16.04、18.04 和 19.04 以及两个不同的 Windows 10 Pro 安装上成功测试了这一点。（没有可用的 Mac）。我对 OpenCV 感到非常高兴！

您可以在我的 SlideSorter 程序中看到它的运行情况，该程序允许对图像和视频进行排序、保留排序顺序并以幻灯片形式呈现。在这里可用：https://sourceforge.net/projects/slidesorter/

【问题讨论】：

你是说不想使用pyqt4？你可以使用Phonon.MediaObject.metaData
如果您使用的是 Windows，请参阅this question。
我在 Linux 上，但希望看到一些跨平台的东西。
我已经在使用 PyQt4，所以自然会利用它。但就可以看到元数据有很多，嗯，元数据，但没有像宽度、高度和持续时间那样平凡（xiph.org/vorbis/doc/v-comment.html）。 'Phonon.VideoPlayer()' 易于使用且运行良好，但我找不到任何我正在寻找的信息。
声子delegates to GStreamer on Linux。见Getting started with GStreamer with Python

标签： python opencv video ffmpeg pyqt5

【解决方案1】：

好的，在我自己调查后，因为我也需要它，看起来可以使用hachoir 完成。这是一个代码 sn-p，它可以为您提供 hachoir 可以读取的所有元数据：

import re
from hachoir.parser import createParser
from hachoir.metadata import extractMetadata

def get_video_metadata(path):
    """
        Given a path, returns a dictionary of the video's metadata, as parsed by hachoir.
        Keys vary by exact filetype, but for an MP4 file on my machine,
        I get the following keys (inside of "Common" subdict):
            "Duration", "Image width", "Image height", "Creation date",
            "Last modification", "MIME type", "Endianness"

        Dict is nested - common keys are inside of a subdict "Common",
        which will always exist, but some keys *may* be inside of
        video/audio specific stream subdicts, named "Video Stream #1"
        or "Audio Stream #1", etc. Not all formats result in this
        separation.

        :param path: str path to video file
        :return: dict of video metadata
    """

    if not os.path.exists(path):
        raise ValueError("Provided path to video ({}) does not exist".format(path))

    parser = createParser(path)
    if not parser:
        raise RuntimeError("Unable to get metadata from video file")

    with parser:
        metadata = extractMetadata(parser)

        if not metadata:
            raise RuntimeError("Unable to get metadata from video file")

    metadata_dict = {}
    line_matcher = re.compile("-\s(?P<key>.+):\s(?P<value>.+)")
    group_key = None  # group_key stores which group we're currently in for nesting subkeys
    for line in metadata.exportPlaintext():  # this is what hachoir offers for dumping readable information
        parts = line_matcher.match(line)  #
        if not parts:  # not all lines have metadata - at least one is a header
            if line == "Metadata:":  # if it's the generic header, set it to "Common: to match items with multiple streams, so there's always a Common key
                group_key = "Common"
            else:
                group_key = line[:-1]  # strip off the trailing colon of the group header and set it to be the current group we add other keys into
            metadata_dict[group_key] = {}  # initialize the group
            continue

        if group_key:  # if we're inside of a group, then nest this key inside it
            metadata_dict[group_key][parts.group("key")] = parts.group("value")
        else:  # otherwise, put it in the root of the dict
            metadata_dict[parts.group("key")] = parts.group("value")

    return metadata_dict

这似乎现在对我来说返回了很好的结果，并且不需要额外的安装。密钥似乎因视频和视频类型而异，因此您需要进行一些检查，而不仅仅是假设存在任何特定的密钥。此代码是为 Python 3 编写的，使用 hachoir3 并改编自 hachoir3 documentation - 我还没有调查它是否适用于 Python 2 的 hachoir。

如果有用的话，我还有以下方法可以将基于文本的持续时间值转换为秒：

def length(duration_value):

    time_split = re.match("(?P<hours>\d+\shrs)?\s*(?P<minutes>\d+\smin)?\s*(?P<seconds>\d+\ssec)?\s*(?P<ms>\d+\sms)", duration_value)  # get the individual time components

    fields_and_multipliers = {  # multipliers to convert each value to seconds
        "hours": 3600,
        "minutes": 60,
        "seconds": 1,
        "ms": 1
    }

    total_time = 0
    for group in fields_and_multipliers:  # iterate through each portion of time, multiply until it's in seconds and add to total
        if time_split.group(group) is not None:  # not all groups will be defined for all videos (eg: "hrs" may be missing)
            total_time += float(time_split.group(group).split(" ")[0]) * fields_and_multipliers[group]  # get the number from the match and multiply it to make seconds


    return total_time

【讨论】：

好吧，看来我们快到了。但是看看那个 RegEx 丛林和所有警告！一点点 json 可能会创造奇迹。
我发现 opencv 可以作为 3-liner 给出答案，如果不是因为 “MacOS 和 Linux 软件包不支持视频相关功能（未使用 FFmpeg 编译）的问题。”
您能否澄清您对使用 JSON 的评论？从 hachoir 返回的文本缺乏结构，该函数返回的 dict 实际上是一种更清晰的表示，类似于从 JSON 加载的数据。我最初使用拆分而不是正则表达式来实现这一点，但是当我遇到更多边缘情况时，使用正则表达式更有意义
我的 json cmets 是针对 hachoire 的人，而不是针对您的正则表达式。我想知道程序中“-图像高度：1080 像素”或“-评论：用户量：100.0%”的值是多少？我不是更需要数字本身，比如 1080 或 100%（作为数字，而不是字符串）吗？提取数字需要额外的努力。
通过将return metadata_dict 替换为return json.dumps(metadata_dict, indent=4)，您的代码对我来说可以变得更加“jsonic”——既适用于漂亮的打印，也适用于处理。它仍然需要提取数字，就像我在 for 循环中使用 if ... elif ...else 语句所做的那样。不好，但解决了 hachoir 限制。

【解决方案2】：

Mediainfo 是另一种选择。与 MediaInfoDLL.py 和 Mediainfo.DLL 库一起跨平台从他们的站点下载 Mediainfo.dll，CLI 包以获取 DLL 或两个文件，包括来自 https://github.com/MediaArea/MediaInfoLib/releases 的 python 脚本

在 python 3.6 中工作：您创建所需参数的字典，键必须准确，但值将在稍后定义，只是为了清楚值可能是什么

from MediaInfoDLL import *

# could be in __init__ of some class
    self.video = {'Format': 'AVC', 'Width': '1920', 'Height':'1080', 'ScanType':'Progressive', 'ScanOrder': 'None', 'FrameRate': '29.970',
                                  'FrameRate_Num': '','FrameRate_Den': '','FrameRate_Mode': '', 'FrameRate_Minimum': '', 'FrameRate_Maximum': '',
                                  'DisplayAspectRatio/String': '16:9', 'ColorSpace': 'YUV','ChromaSubsampling': '4:2:0', 'BitDepth': '8',
                                  'Duration': '', 'Duration/String3': ''}
    self.audio = {'Format': 'AAC', 'BitRate': '320000', 'BitRate_Mode': 'CBR', 'Channel(s)': '2', 'SamplingRate': '48000', 'BitDepth': '16'}

#a method within a class:

   def mediainfo(self, file):
        MI = MediaInfo()
        MI.Open(file)
        for key in self.video:
            value = MI.Get(Stream.Video, 0, key)
            self.video[key] = value
        for key in self.audio:
            # 0 means track 0
            value = MI.Get(Stream.Audio, 0, key)
            self.audio[key] = value
        MI.Close()   
    .
    .
    #calling it from another method:
    self.mediainfo(self.file) 
    .
# you'll get a dict with correct values, if none then value is ''
# for example to get frame rate out of that dictionary:
fps = self.video['FrameRate']

【讨论】：

谢谢。在 Ubuntu Linux Mate 16.04 你必须安装： Py2: python-mediainfodll Py3: python3-mediainfodll 然后导入为： Py2: MediaInfoDLL Py3: MediaInfoDLL3 试试看。
很奇怪：虽然可以完成导入，并且 MI.Open 没问题，但在 Py2 中每个“值”都是空的，而在 Py3 中，MI.Get 命令总是产生错误。尝试使用 mp4 和 mov 文件。
还没有在 Linux 上玩过它，不过很快就会了。在 Windows 上，我已经导入了 MediaInfoDLL.py（MediaInfoDLL3.py 完全相同，不需要），并且我在工作目录中包含了 Mediainfo.DLL。
在 (Ubuntu)Linux 上，您必须在 Py2 上导入 MediaInfoDLL，在 Py3 上导入 *3，否则会出现导入错误。但它不起作用。 MediaInfo 已安装
但是使用 pymediainfo ("from pymediainfo import MediaInfo")，我理解它是 MediaInfo 的 py 包装器，确实有效。它表明 MediaInfo 是可访问的。但是，对于我获取持续时间、宽度和高度的需要，它确实比初始帖子中显示的 ffprobe 方法花费了 4 倍以上的时间。而且还需要MediaInfo的外部安装