如何从 YUV2 流中检索原始数据答案

【问题标题】：How to retrieve raw data from YUV2 streaming如何从 YUV2 流中检索原始数据
【发布时间】：2022-01-15 04:32:51
【问题描述】：

我正在通过 Windows (usb) 上的主机应用程序连接 qvga 传感器流式传输 yuv2 格式数据。如何使用任何 opencv-python 示例应用程序从 yuv2 格式流式传输或捕获原始数据。

我该怎么做？是否有任何测试示例可以这样做？

//opencv-python (host appl)
import cv2
import numpy as np
    
# open video0
cap = cv2.VideoCapture(0, cv2.CAP_MSMF)
# set width and height
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 340)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 240)
# set fps
cap.set(cv2.CAP_PROP_FPS, 30)
while(True):
    # Capture frame-by-frame
    ret, frame = cap.read()
    # Display the resulting frame
    cv2.imshow('frame', frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()

获取视频帧而不解码的代码示例：

import cv2
import numpy as np

# open video0
# -------> Try replacing cv2.CAP_MSMF with cv2.CAP_FFMPEG):
cap = cv2.VideoCapture(0, cv2.CAP_FFMPEG)

# set width and height
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 340)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 240)
# set fps
cap.set(cv2.CAP_PROP_FPS, 30)

# Fetch undecoded RAW video streams
cap.set(cv2.CAP_PROP_FORMAT, -1)  # Format of the Mat objects. Set value -1 to fetch undecoded RAW video streams (as Mat 8UC1)

for i in range(10):
    # Capture frame-by-frame
    ret, frame = cap.read()

    if not ret:
        break

    print('frame.shape = {}    frame.dtype = {}'.format(frame.shape, frame.dtype))

cap.release()

如果cv2.CAP_FFMPEG 不起作用，请尝试以下代码示例：

import cv2
import numpy as np

# open video0
cap = cv2.VideoCapture(0, cv2.CAP_MSMF)

# set width and height
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 340)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 240)
# set fps
cap.set(cv2.CAP_PROP_FPS, 30)

# -----> Try setting FOURCC and disable RGB conversion:
#########################################################
cap.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter.fourcc('Y','1','6',' ')) 
cap.set(cv2.CAP_PROP_CONVERT_RGB, 0)    
#########################################################

# Fetch undecoded RAW video streams
cap.set(cv2.CAP_PROP_FORMAT, -1)  # Format of the Mat objects. Set value -1 to fetch undecoded RAW video streams (as Mat 8UC1)

for i in range(10):
    # Capture frame-by-frame
    ret, frame = cap.read()

    if not ret:
        break

    print('frame.shape = {}    frame.dtype = {}'.format(frame.shape, frame.dtype))

cap.release()

将uint8 帧重新整形为680x240 并保存为img.png：

import cv2
import numpy as np

# open video0
cap = cv2.VideoCapture(0, cv2.CAP_MSMF)

# set width and height
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 340)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 240)
cap.set(cv2.CAP_PROP_FPS, 30) # set fps

# Disable the conversion to BGR by setting FOURCC to Y16 and `CAP_PROP_CONVERT_RGB` to 0.
cap.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter.fourcc('Y','1','6',' ')) 
cap.set(cv2.CAP_PROP_CONVERT_RGB, 0)    

# Fetch undecoded RAW video streams
cap.set(cv2.CAP_PROP_FORMAT, -1)  # Format of the Mat objects. Set value -1 to fetch undecoded RAW video streams (as Mat 8UC1)

for i in range(10):
    # Capture frame-by-frame
    ret, frame = cap.read()

    if not ret:
        break

    cols = 340*2
    rows = 240

    img = frame.reshape(rows, cols)

    cv2.imwrite('img.png', img)

cap.release()

//680x240 img.png

//存在热点对象 (img1.png)

//处理后的图像（热物体）

//使用小端（测试）

//使用 CAP_DSHOW 测试图像（捕获）

//使用 CAP_DSHOW 测试图像（保存）

//680x240 (hand.png)

//680x240 (hand1.png)

//fing 预览

//fing.png

【问题讨论】：

请提出建议。
运气好吗？ YUV2 streaming 不是已知格式...
@Rotem 还没有。我从传感器获取视频数据，将其转换为 YUV2 格式并发送到 USB（uvc），上面发布的图像显示了我在 windows 上的 hos 应用程序中得到的内容。这里描述的yuv2格式：fourcc.org/pixel-format/yuv-yuy2 stackoverflow.com/questions/36228232/yuy2-vs-yuv-422
YUY2 和 YUV2 不是一回事...我想知道您的相机是灰度相机，而不是彩色相机。您能否添加一些有关相机（或传感器）型号的详细信息？
是的，我明白了。我得到的是 YUV2 格式（根据图像）。传感器的实际输出是 raw16，它在内部（在 uvc 描述符中）用 YUV2 GUID 映射并产生 yuv2 输出格式。该红外传感器基于研究导向，因此没有型号。对不起。您能否让我了解这些数据，这实际上是什么以及需要检查更多才能获得图像。谢谢。

标签： python opencv yuv raw uvc

【解决方案1】：

视频中像素的真实格式是int16灰度像素，但它被标记为YUV2格式（可能是为了兼容不支持16位的采集器）。

我看到RAVI format 使用了相同的技术。

OpenCV 的默认行为是将帧从 YUV2 转换为 BGR 格式。
由于格式没有颜色（仅标记为YUV2），因此转换会弄乱您的数据。

我在这里可能是错的......但看起来格式是“大端”并且有符号 16 位。

这里是抓取和显示视频的完整代码示例：

# open video0
cap = cv2.VideoCapture(0, cv2.CAP_MSMF)

# set width and height
cols, rows = 340, 240
cap.set(cv2.CAP_PROP_FRAME_WIDTH, cols)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, rows)
cap.set(cv2.CAP_PROP_FPS, 30) # set fps

# Disable the conversion to BGR by setting FOURCC to Y16 and `CAP_PROP_CONVERT_RGB` to 0.
cap.set(cv2.CAP_PROP_FOURCC, cv2.VideoWriter.fourcc('Y','1','6',' ')) 
cap.set(cv2.CAP_PROP_CONVERT_RGB, 0)    

# Fetch undecoded RAW video streams
cap.set(cv2.CAP_PROP_FORMAT, -1)  # Format of the Mat objects. Set value -1 to fetch undecoded RAW video streams (as Mat 8UC1)

while True:
    # Capture frame-by-frame
    ret, frame = cap.read()

    if not ret:
        break

    # Convert the frame from uint8 elements to big-endian signed int16 format.
    frame = frame.reshape(rows, cols*2) # Reshape to 680x240
    frame = frame.astype(np.uint16) # Convert uint8 elements to uint16 elements
    frame = (frame[:, 0::2] << 8) + frame[:, 1::2]  # Convert from little endian to big endian (apply byte swap), the result is 340x240.
    frame = frame.view(np.int16)  # The data is actually signed 16 bits - view it as int16 (16 bits singed).

    # Apply some processing for disapply (this part is just "cosmetics"):
    frame_roi = frame[:, 10:-10]  # Crop 320x240 (the left and right parts are not meant to be displayed).
    # frame_roi = cv2.medianBlur(frame_roi, 3)  # Clean the dead pixels (just for better viewing the image).
    frame_roi = frame_roi << 3  # Remove the 3 most left bits ???
    normed = cv2.normalize(frame_roi, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)  # Convert to uint8 with normalizing (just for viewing the image).

    cv2.imshow('normed', normed)  # Show the normalized video frame

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

    # cv2.imwrite('normed.png', normed)

cap.release()
cv2.destroyAllWindows()

将每个像素左移 3 (frame_roi = frame_roi << 3) 可以解决大部分问题。

可能是高3位没有到位，或者有什么不同的含义？

ROI 裁剪和标准化只是“化妆品”，因此您可以看到一些东西。

这是您发布的经过处理的图像（带有热门对象）：

对于小端，替换以下行：

frame = frame.reshape(rows, cols*2) # Reshape to 680x240
frame = frame.astype(np.uint16) # Convert uint8 elements to uint16 elements
frame = (frame[:, 0::2] << 8) + frame[:, 1::2]  # Convert from little endian to big endian (apply byte swap), the result is 340x240.
frame = frame.view(np.int16)  # The data is actually signed 16 bits - view it as int16 (16 bits singed).

与：

frame = frame.view(np.int16).reshape(rows, cols)

如果值都是正数（uint16 类型），请尝试：

frame = frame.view(np.uint16).reshape(rows, cols)

处理图像显示的草图代码：

frame = cv2.imread('hand1.png', cv2.IMREAD_UNCHANGED)  # Read input image (grayscale uint8)


# create a CLAHE object (Arguments are optional).
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))


# Convert the frame from uint8 elements to big-endian signed int16 format.
frame = frame.reshape(rows, cols * 2)  # Reshape to 680x240
frame = frame.astype(np.uint16)  # Convert uint8 elements to uint16 elements
frame = (frame[:, 0::2] << 8) + frame[:, 1::2]  # Convert from little endian to big endian (apply byte swap), the result is 340x240.
frame = frame.view(np.int16)  # The data is actually signed 16 bits - view it as int16 (16 bits singed).

# Apply some processing for display (this part is just "cosmetics"):
frame_roi = frame[:, 10:-10]  # Crop 320x240 (the left and right parts are not meant to be displayed).
# frame_roi = cv2.medianBlur(frame_roi, 3)  # Clean the dead pixels (just for better viewing the image).

#frame_roi = frame_roi << 3  # Remove the 3 most left bits ???
frame_roi = frame_roi << 1  # Remove the 1 most left bits ???

# Fix the offset difference between the odd and even columns (note: this is not a good solution).
#frame_as_uint16 = (frame_roi.astype(np.int32) + 32768).astype(np.uint16)
frame_as_uint16 = frame_roi.view(np.uint16)  # Try to interpret the data as unsigned
frame_as_float = frame_as_uint16.astype(np.float32) / 2  # Divide by 2 for avoiding overflow
med_odd = np.median(frame_as_float[:, 0::2])
med_evn = np.median(frame_as_float[:, 1::2])
med_dif = med_odd - med_evn
frame_as_float[:, 0::2] -= med_dif/2
frame_as_float[:, 1::2] += med_dif/2
frame_as_uint16 = np.round(frame_as_float).clip(0, 2**16-1).astype(np.uint16)

cl1 = clahe.apply(frame_as_uint16)  # Apply contrast enhancement.
normed = cv2.normalize(cl1, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)  # Convert to uint8 with normalizing (just for viewing the image).

cv2.imwrite('normed.png', normed)

cv2.imshow('normed', normed)
cv2.waitKey()
cv2.destroyAllWindows()

【讨论】：

感谢您的详细解释。你是对的。真正的像素每个为 2 个字节，并被标记为 YUV2 格式以实现兼容性。格式是“小端”。我可以使用“小端”进行测试吗？我添加了处理后的图像（带有热物体）。请检查。谢谢。
我知道它应该是小端，但结果看起来像噪音。我在帖子中添加了一个小端转换的示例。可能还有其他问题（相机配置问题？）。你为什么使用cv2.CAP_MSMF？它与cv::CAP_DSHOW 一起工作吗？你知道像素是否应该有负值吗？
是的，看起来很吵。我发布了经过测试的图像（带有小端序）。用“np.int16”和“np.uint16”测试。我使用“cap = cv2.VideoCapture(0, cv2.CAP_DSHOW)”进行了检查，但它给了我错误，因为“frame = frame.view(np.int16).reshape(rows, cols) ValueError: When change to a large dtype , 它的大小必须是数组最后一个轴的总大小（以字节为单位）的除数。”
好的，检查frame.shape和frame.dtype
使用这个cap = cv2.VideoCapture(0, cv2.CAP_DSHOW)给了我...frame.shape = (240, 340, 3) frame.dtype = uint8 frame.shape = (240, 340, 3) frame.dtype = uint8 frame.shape = (240, 340, 3) frame.dtype = uint8 frame.shape = (240, 340, 3) frame.dtype = uint8 frame.shape = (240, 340, 3) frame.dtype = uint8 frame.shape = (240, 340, 3) frame.dtype = uint8 frame.shape = (240, 340, 3) frame.dtype = uint8 frame.shape = (240, 340, 3) frame.dtype = uint8 frame.shape = (240, 340, 3) frame.dtype = uint8 frame.shape = (240, 340, 3) frame.dtype = uint8