VHDL RGB 到 YUV444 实现不匹配答案

【问题标题】：VHDL RGB to YUV444 implementation mismatchVHDL RGB 到 YUV444 实现不匹配
【发布时间】：2021-05-14 09:30:24
【问题描述】：

设计

我正在尝试在硬件中实现 RGB 到 YUV444 的转换算法，基于我在基于 C 的程序中工作的下一个近似值：

#define CLIP(X) ( (X) > 255 ? 255 : (X) < 0 ? 0 : X)
#define RGB2Y(R, G, B) CLIP(( (  66 * (R) + 129 * (G) +  25 * (B) + 128) >> 8) +  16)
#define RGB2U(R, G, B) CLIP(( ( -38 * (R) -  74 * (G) + 112 * (B) + 128) >> 8) + 128)
#define RGB2V(R, G, B) CLIP(( ( 112 * (R) -  94 * (G) -  18 * (B) + 128) >> 8) + 128)

目前已完成模拟/验证

我使用自检方法模拟并验证了 VHDL 代码。我使用多个图像将 VHDL 输出与已知工作 C 算法生成的“黄金”参考 YUV444 值进行了比较，所有模拟都成功运行。

问题

当我在硬件中实现它并插入视频管道时，视频输出在同步方面看起来很好，帧速率、闪烁等方面没有明显问题，但问题是颜色不对，有一个洋红色/紫色演员，例如黄色会显得浅洋红色，红色会更暗......等等

我认为 UV 值很可能被剪裁/饱和，而 Y（亮度）工作正常，这就是我在生成的视频中看到的。

代码

请注意，为简单起见，我只发布了进行转换的部分以及相关的信号声明类型和功能。其余代码只是一个 axi 视频流信号包装器，它在视频同步方面运行良好，并且在这个意义上没有视频问题，如果您认为它会有所帮助，请告诉我，我也会发布它。

--功能：

  -- Absolute operation of: "op1 - op2 + op3" for the UV components
  function uv_op_abs(op1 : unsigned(15 downto 0); op2 : unsigned(15 downto 0); op3 : unsigned(15 downto 0))
  return unsigned is
    variable res1 : unsigned(15 downto 0);
  begin
    if op2 > op1 then
      res1 := (op2 - op1);
      if res1 > op3 then
        return res1 - op3;
      else 
        return op3 - res1;
      end if;
    else
      return (op1 - op2) + op3;
    end if;
  end uv_op_abs;

 function clip(mult_in : unsigned(15 downto 0))
  return unsigned is
  begin
    if to_integer(unsigned(mult_in)) > 240 then
      return unsigned(to_unsigned(240,8));
    else
      return unsigned(mult_in(7 downto 0));
    end if;
  end clip;

-- 信号/常量声明：

  --Constants
  constant coeff_0 : unsigned(7 downto 0) := "01000010";  --66
  constant coeff_1 : unsigned(7 downto 0) := "10000001";  --129  
  constant coeff_2 : unsigned(7 downto 0) := "00011001";  --25 
  constant coeff_3 : unsigned(7 downto 0) := "00100110";  --38
  constant coeff_4 : unsigned(7 downto 0) := "01001010";  --74
  constant coeff_5 : unsigned(7 downto 0) := "01110000";  --112
  constant coeff_6 : unsigned(7 downto 0) := "01011110";  --94
  constant coeff_7 : unsigned(7 downto 0) := "00010010";  --18
  constant coeff_8 : unsigned(7 downto 0) := "10000000";  --128
  constant coeff_9 : unsigned(7 downto 0) := "00010000";  --16

  --Pipeline registers
  signal red_reg : unsigned(7 downto 0);
  signal green_reg : unsigned(7 downto 0);
  signal blue_reg : unsigned(7 downto 0);

  signal y_red_reg_op1 : unsigned(15 downto 0);
  signal y_green_reg_op1 : unsigned(15 downto 0);
  signal y_blue_reg_op1 : unsigned(15 downto 0);

  signal u_red_reg_op1 : unsigned(15 downto 0);
  signal u_green_reg_op1 : unsigned(15 downto 0);
  signal u_blue_reg_op1 : unsigned(15 downto 0);

  signal v_red_reg_op1 : unsigned(15 downto 0);
  signal v_green_reg_op1 : unsigned(15 downto 0);
  signal v_blue_reg_op1 : unsigned(15 downto 0);

  signal y_reg_op2 : unsigned(15 downto 0);
  signal u_reg_op2 : unsigned(15 downto 0);
  signal v_reg_op2 : unsigned(15 downto 0);

  signal y_reg_op3 : unsigned(7 downto 0);
  signal u_reg_op3 : unsigned(7 downto 0);
  signal v_reg_op3 : unsigned(7 downto 0);

-- YUV444转换过程：

  RGB_YUV_PROC : process(clk)
  begin
    if rising_edge(clk) then
      if rst = '1' then
        red_reg <= (others => '0');
        green_reg <= (others => '0');
        blue_reg <= (others => '0');
        y_red_reg_op1 <= (others => '0');
        y_green_reg_op1 <= (others => '0');
        y_blue_reg_op1 <= (others => '0');
        u_red_reg_op1 <= (others => '0');
        u_green_reg_op1 <= (others => '0');
        u_blue_reg_op1 <= (others => '0');
        v_red_reg_op1 <= (others => '0');
        v_green_reg_op1 <= (others => '0');
        v_blue_reg_op1 <= (others => '0');
        y_reg_op2 <= (others => '0');
        u_reg_op2 <= (others => '0');
        v_reg_op2 <= (others => '0');
        y_reg_op3 <= (others => '0');
        u_reg_op3 <= (others => '0');
        v_reg_op3 <= (others => '0');
        yuv444_out <= (others => '0');
        soff_sync <= '0';
      else

        --Sync with first video frame with the tuser (sof) input signal
        if rgb_sof_in = '1' then
          soff_sync <= '1';
        end if;

        --Fetch a pixel
        if (rgb_sof_in = '1' or soff_sync = '1') and rgb_valid_in = '1' and yuv444_ready_out = '1' and bypass = '0' then
          green_reg <= unsigned(rgb_in(7 downto 0));
          blue_reg <= unsigned(rgb_in(15 downto 8));
          red_reg <= unsigned(rgb_in(23 downto 16));
        end if;

        -- RGB to YUV conversion
        -- Y--> CLIP(( (  66 * (R) + 129 * (G) +  25 * (B) + 128) >> 8) +  16)
        -- U--> CLIP(( ( -38 * (R) -  74 * (G) + 112 * (B) + 128) >> 8) + 128)
        -- V--> CLIP(( ( 112 * (R) -  94 * (G) -  18 * (B) + 128) >> 8) + 128)
        if (rgb_sof_in = '1' or soff_sync = '1') and (valid_delay = '1' or validff1 = '1') and yuv444_ready_out = '1' and bypass = '0' then
          --Y calc (  66 * (R) + 129 * (G) +  25 * (B) + 128) >> 8) +  16)
          y_red_reg_op1 <= coeff_0 * red_reg;
          y_green_reg_op1 <= coeff_1 * green_reg;
          y_blue_reg_op1 <= coeff_2 * blue_reg; 
          y_reg_op2 <=  y_red_reg_op1 + y_green_reg_op1 + y_blue_reg_op1 + (X"00" & coeff_8);
          y_reg_op3 <= (y_reg_op2(15 downto 8) + coeff_9);

          --U calc ( -38 * (R) -  74 * (G) + 112 * (B) + 128) >> 8) + 128)
          u_red_reg_op1 <= coeff_3 * red_reg;
          u_green_reg_op1 <= coeff_4 * green_reg;
          u_blue_reg_op1 <= coeff_5 * blue_reg;
          u_reg_op2 <= uv_op_abs(u_blue_reg_op1, (u_red_reg_op1 + u_green_reg_op1), (X"00" & coeff_8));
          u_reg_op3 <= (u_reg_op2(15 downto 8) + coeff_8);

          --V calc ( 112 * (R) -  94 * (G) -  18 * (B) + 128) >> 8) + 128)
          v_red_reg_op1 <= coeff_5 * red_reg;
          v_green_reg_op1 <= coeff_6 * green_reg;
          v_blue_reg_op1 <= coeff_7 * blue_reg;
          v_reg_op2 <= uv_op_abs(v_red_reg_op1, (v_blue_reg_op1 + v_green_reg_op1), (X"00" & coeff_8));
          v_reg_op3 <= (v_reg_op2(15 downto 8) + coeff_8);

          --Output data
          yuv444_out <= std_logic_vector(v_reg_op3) & std_logic_vector(u_reg_op3) & std_logic_vector(y_reg_op3);
        elsif yuv444_ready_out = '1' and rgb_valid_in = '1' and bypass = '1' then
          yuv444_out <= rgb_in;
        end if;

      end if;
    end if;
  end process; -- RGB_YUV_PROC

我也尝试过添加 'clip;控制溢出的功能认为它会在“剪辑”的情况下对合成工具有所帮助，但它没有帮助，问题仍然存在：

if (rgb_sof_in = '1' or soff_sync = '1') and (valid_delay = '1' or validff1 = '1') and yuv444_ready_out = '1' and bypass = '0' then
  --Y calc (  66 * (R) + 129 * (G) +  25 * (B) + 128) >> 8) +  16)
  y_red_reg_op1 <= coeff_0 * red_reg;
  y_green_reg_op1 <= coeff_1 * green_reg;
  y_blue_reg_op1 <= coeff_2 * blue_reg; 
  y_reg_op2 <=  y_red_reg_op1 + y_green_reg_op1 + y_blue_reg_op1 + (X"00" & coeff_8);
  y_reg_op3 <= clip( X"00" & (y_reg_op2(15 downto 8) + coeff_9));

  --U calc ( -38 * (R) -  74 * (G) + 112 * (B) + 128) >> 8) + 128)
  u_red_reg_op1 <= coeff_3 * red_reg;
  u_green_reg_op1 <= coeff_4 * green_reg;
  u_blue_reg_op1 <= coeff_5 * blue_reg;
  u_reg_op2 <= uv_op_abs(u_blue_reg_op1, (u_red_reg_op1 + u_green_reg_op1), (X"00" & coeff_8));
  u_reg_op3 <= clip( X"00" & (u_reg_op2(15 downto 8) + coeff_8));

  --V calc ( 112 * (R) -  94 * (G) -  18 * (B) + 128) >> 8) + 128)
  v_red_reg_op1 <= coeff_5 * red_reg;
  v_green_reg_op1 <= coeff_6 * green_reg;
  v_blue_reg_op1 <= coeff_7 * blue_reg;
  v_reg_op2 <= uv_op_abs(v_red_reg_op1, (v_blue_reg_op1 + v_green_reg_op1), (X"00" & coeff_8));
  v_reg_op3 <= clip( X"00"& (v_reg_op2(15 downto 8) + coeff_8));

问题

我知道，在硬件设计中，成功的模拟并不一定意味着设计在综合后将以相同的方式工作，我确信代码中有很多可以改进的地方。这种方法有一些根本性的错误，但到目前为止我看不到，有没有人知道什么可能是错误的以及为什么？

【问题讨论】：

模拟是否测试了YCbCr的全系列？（我认为您的意思是 YCbCr，因为 YUV 是模拟视频）当您检查全范围时，模拟中是否会出现饱和？
是的，没错，对不起，我的意思是 YCbCr....问题是，无论光线条件如何，它总是会发生，其中一项实验是为测试台提供来自用拍摄的静止图像的 rgb 值相机并将结果与软件转换“黄金”值进行比较，验证成功完成，但测试将相机指向同一视图的硬件实现会导致偏色问题
你怎么知道黄金参考中也不存在这个问题？您是否将测试结果转换为位图（或类似的）以直观地查看结果？

标签： vhdl video-processing yuv

【解决方案1】：

首先要做的是：检查您使用的是 YUV 还是 YCbCr。那些经常混淆，不一样！！！不要混合它们。

然后我看到了：

#define CLIP(X) ( (X) > 255 ? 255 : (X) < 0 ? 0 : X)

和

function clip(mult_in : unsigned(15 downto 0))
  return unsigned is
  begin
    if to_integer(unsigned(mult_in)) > 240 then
      return unsigned(to_unsigned(240,8));
    else
      return unsigned(mult_in(7 downto 0));
    end if;
  end clip;

这些是非常不同的功能。第一个使用有符号数据类型并在 255 和 0 之间进行剪辑，第二个仅剪辑 240 正数，并且由于减法可能导致的算术溢出将无法正确处理。出于某种原因，您在整个代码中使用unsigned 算术！（为什么？signed 有什么问题？）

所以你已经在比较苹果和橙子了。

接下来您似乎使用的是绝对函数？！为什么？这根本不是原始代码的一部分。这当然会产生人工制品。您不能只翻转负值上的符号并期望它们是正确的吗？

另外，请使用正确的命名。不应将常量值 16 命名为 coeff_9。使代码难以阅读和维护。如果您想要灵活性，请完全使用不同的结构。像coeff_X 这样的名字并不能告诉你什么：当然它可能是一个系数，但它是用来做什么的等等。

其实你可以写（注意，我已经假设有符号算术）

y_red_reg_op1 <= red_reg * to_signed(66, 8);

或者，因为red_reg'length已经是8，偶数

y_red_reg_op1 <= red_reg * 66;

这更容易阅读

然后代码可以变成类似的东西（再次假设您将使用signed）

  --U calc (( -38 * (R) -  74 * (G) + 112 * (B) + 128) >> 8) + 128)
  u_red_reg_op1 <= -38 * red_reg;
  u_green_reg_op1 <= 74 * green_reg;
  u_blue_reg_op1 <= 112 * blue_reg;
  u_reg_op2 <= u_red_reg_op1 - u_green_reg_op1 + u_blue_reg_op1 + 128;
  u_reg_op3 <= clip(shift_right(u_reg_op2, 8) + 128);

clip 当然应该是

function clip(value : signed(15 downto 0))
  return signed is
begin
  if value > 255 then
    return to_signed(255, 8);
  elsif value < 0 then
    return to_signed(0, 8);
  else
    return value;
  end if;
end clip;

附言我希望你使用的是numeric_std。

如果所有这些仍然产生伪影，请检查您是否没有混淆 RGB 或 YCbCr 信号分量顺序。这是一个常见的错误。

最后附言VHDL 实际上有一个定点库，带有饱和逻辑。它得到了大型FPGA制造商的支持。您甚至可以考虑使用它来编写比 C 更“更好”的解决方案。

编辑：我刚刚阅读了wikipedia，整个算法根本不需要剪辑或绝对值或任何这些。

从 8 位 RGB 到 16 位值的基本转换（Y'：无符号，U/V：有符号，矩阵值四舍五入，以便以后每个 [0..255] 的所需 Y'UV 范围在不会发生溢出的情况下达到）：
通过四舍五入 ("+128") 缩小 (">>8") 到 8 位值（Y'：无符号，U/V：有符号）：
向值添加偏移量以消除任何负值（所有结果均为 8 位无符号）：

你也应该实现你的算法

可能会变成这样的

  --U calc ((- 38 * (R) -  74 * (G) + 112 * (B) + 128) >> 8) + 128)
  u_red_reg_op1 <= -38 * red_reg; -- 16 bit signed
  u_green_reg_op1 <= -74 * green_reg; -- 16 bit signed
  u_blue_reg_op1 <= 112 * blue_reg; -- 16 bit signed
  u_reg_op2 <= u_red_reg_op1 + u_green_reg_op1 + u_blue_reg_op1 + 128; -- 16 bit signed
  u_reg_op3 <= unsigned(resize(shift_right(u_reg_op2, 8), 8) + 128); -- 8 bit unsigned

（检查！）

最后同样重要的是：您需要更好的测试平台，用您的实施输出来确认“黄金”源的结果。

【讨论】：

感谢您的回答和建议，老实说，我怀疑是使用已签名还是未签名，但我的错误是尝试未签名，认为它会更简单（最终使其更复杂）并且信任在一个不是很好的测试台上:) 我认为它可以解释和解决我遇到的问题以及与模拟的不匹配。同意命名我通常会尝试使其可读，但这是作为一个快速原型编写的，如果它有效，我当然计划用更好的名称重写它们....我将应用更改改进测试台，看看是否作品