【翻译】Temporal supersampling and antialiasing

原文链接：https://bartwronski.com/2014/03/15/temporal-supersampling-and-antialiasing/

Aliasing problem锯齿问题

Before I address temporal supersampling, just a quick reminder on what aliasing is.

在我讲述时间超级采样之前，对锯齿是什么做一个快速的提示。

Aliasing is a problem that is very well defined in signal theory. According to the general sampling theorem we need to have our signal spectrum containing only frequencies lower than Nyquist frequency. If we don’t (and when rasterizing triangles we always will as triangle edge is infinite frequency spectrum, step-like response) we will have some frequencies appearing in the final signal (reconstructed from samples) that were not in the original signal. Visual aliasing can have different appearance, it can appear as regular patterns (so-called moire), noise or flickering.

锯齿是一个在信号理论中意义明确的问题。根据普遍的采样原理，我们需要让我们的信号频谱只包含比Nyquist频率低的频率。如果我们不这么做（并且当光栅化三角形时我们总是会这样因为三角形边是无限频率的频谱，梯状的反应）我们将有一些频率表现在最终的信号（采样重构）不是原始的信号。可见的锯齿会有不同的表现，他可能以规律的模式（称为云纹），噪波或闪烁表现出来。

Classic supersampling经典的超级采样

Classic supersampling is a technique that is extremely widely used by the CGI industry. Per every target image fragment we perform sampling multiple times at much higher frequencies (for example by tracing multiple rays per simply pixel or shading fragments multiple times at various positions that cover the same on-screen pixel) and then performing the signal downsampling/filtering – for example by averaging. There are various approaches to even easiest supersampling (I talked about this in one of my previous blog posts), but the main problem with it is the associated cost – N times supersampling means usually N times the basic shading cost (at least for some pipeline stages) and sometimes additionally N times the basic memory cost. Even simple, hardware-accelerated techniques like MSAA that do estimate only some parts of the pipeline (pixel coverage) in higher frequency and don’t provide as good results, have quite big cost on consoles.

经典的超级采样是一个在CGI行业广泛使用的技术。对每一个目标图像像素我们用更高的频率执行多次采样（例如，在每个简单像素跟踪多个射线或在覆盖相同屏幕像素的不同位置上着色多次片段）并且之后执行信号下采样/过滤——例如通过均值。有不同的方式去实现甚至最早还有超级采样（我在我之前的博客中有讨论过）。但是最主要的问题是它相关的性能消耗，N次的超级采样意味着通常N倍的基本着色消耗（至少对于一些渲染管线阶段）。并且一些额外的N倍的基本内存消耗。甚至简单的，硬件加速技术像MSAA只在部分管线阶段用更高的频率，并且提供不了那么好的效果，在游戏机有相当高的消耗。

But even if supersampling is often unpractical technique, it’s temporal variation can be applied with almost zero cost.

即使超级采样是一个难以应用的技术，但是他的时间变体几乎0消耗应用。

Temporal supersampling theory时间超级采样理论

So what is the temporal supersampling? Temporal supersampling techniques base on a simple observation – from frame to frame most of the on-screen screen content do not change. Even with complex animations we see that multiple fragments just change their position, but apart from this they usually correspond to at least some other fragments in previous and future frames.

所以时间超级采样是什么？时间超级采样技术基于一个简单的观察——帧与帧之间屏幕上的屏幕内容大多数都没有改变。甚至一些有复杂的动画我们看到多个像素只是改变了他们的位置，但是除此之外，他们通常对应至少一些其他的片段在之前和未来的帧中。

Based on this observation, if we know the precise texel position in previous frame (and we often do! Using motion vectors that are used for per-object motion blur for instance), we can distribute the multiple fragment evaluation component of supersampling between multiple frames.

基于这个观察，如果我们知道之前帧精确的像素位置（并且我们经常这么做！使用运动向量用于逐对象的运动模糊实例），我们可以分布超级采样的多个像素求值分量到多个帧之间。

What is even more exciting is that this technique can be applied to any pass – to your final image, to AO, screen-space reflections and others – to either filter the signal or increase the number of samples taken. I will first describe how it can be used to supersample final image and achieve much better AA and then example of using it to double or triple number of samples and quality of effects like SSAO.

更让人惊喜的是这个技术可以应用到任何Pass中——最终的图像，AO，SSR以及其他——甚至过滤信号或增加采样的数量。我将第一次描述它如果用与超级采样最终的图像并且实现更佳的AA，并且之后使用到双倍或三倍数量的采样和效果质量像SSAO的例子

Temporal antialiasing时间抗锯齿

I have no idea which game was the first to use the temporal supersampling AA, but Tiago Sousa from Crytek had a great presentation on Siggraph 2011 on that topic and its usage in Crysis 2 [1]. Crytek proposed using a sub pixel jitter to the final MVP transformation matrix that alternates every frame – and combine two frames in post-effect style pass. This way they were able to increase the sampling resolution twice at almost no cost!

我不清楚哪一款游戏是第一个使用时间超级采样抗锯齿的，但是Crytek的Tiago Sousa在Siggraph2011有一个很棒的主题展示也包含了它在Crysis2的使用。Cryteck推荐每一帧交替的对最终的MVP变换矩阵使用一个次像素抖动——并且在后处理风格Pass中将两帧合并。这个方式可以在几乎0消耗的情况下提升两倍的采样分辨率。

Too good to be true?

好的让人难以相信？

Yes, the result of such simple implementation looks perfect on still screenshots (and you can implement it in just couple hours!***), but breaks in motion. Previous frame pixels that correspond to current frame were in different positions. This one can be easily fixed by using motion vectors, but sometimes the information you are looking for was occluded or had. To address that, you cannot rely on depth (as the whole point of this technique is having extra coverage and edge information from the samples missing in current frame!), so Crytek proposed relying on comparison of motion vector magnitudes to reject mismatching pixels.

是的，这样简单的实现结果在静止的截屏上看起来完美（并且你可以实现它在短短的几个小时——我不可以），但是在运动的时候会露馅。之前的帧像素对应的当前帧像素在不同的位置。这个问题可以使用motion vectors简单的修复。但是有的时候，你查找的信息被阻挡的。为了处理这个问题，你不能依靠深度（因为这个技术的整体要点是有额外的覆盖和来自当前帧的采样丢失边缘信息！），所以Crytek 推荐依靠motion vector放大来对比拒绝不匹配的像素。

***yeah, I really mean maximum one working day if you have a 3D developer friendly engine. Multiply your MVP matrix with a simple translation matrix that jitters in (-0.5 / w, -0.5 / h) and (0.5 / w, 0.5 / h) every other frame plus write a separate pass that combines frame(n) and frame(n-1) together and outputs the result.

***是的，如果你有一个3D开发友好的引擎，我的意思是最多一个工作日。将你的MVP矩阵乘一个简单的(-0.5 / w, -0.5 / h) 和(0.5 / w, 0.5 / h) 抖动过渡矩阵每隔一帧+写一个单独的pass将第n帧和第n-1帧混合在一起并且输出结果。

Usage in Assassin’s Creed 4 – motivation《刺客信条4》中的使用—动机

For a long time we relied on FXAA (aided by depth-based edge detection) as a simple AA technique during our game development. This simple technique usually works “ok” with static image and improves its quality, but breaks in motion – as edge estimations and blurring factors change from frame to frame. While our motion blur (simple and efficient implementation that used actual motion vectors for every skinned and moving objects) helped to smooth edge look for objects moving quite fast (small motion vector dilation helped even more), it didn’t do anything with calm animations and subpixel detail. And our game was full of them – just look at all the ropes tied to sails, nicely tessellated wooden planks and dense foliage in jungles!

很长的一段时间，在我们的游戏开发期间我们依靠FXAA（受助于通过基于深度的边缘检测）作为简单的抗锯齿技术。这个简单的技术在一些静帧图像和提升它的质量上通常工作的还可以，但是运动的时候会出错——因为边缘判断和模糊因素逐帧改变。然而我们的运动模糊（用于每个蒙皮且运动的对象使用真实的运动向量的简单和高效的实现）有助于为运动的相当快的对象的平滑边缘（微小的运动向量更有帮助）。它对任何平静动画和次像素细节的没有作用。并且我们的游戏充满了它们——只需要看看所有的系着帆的绳索，在丛林中完美嵌合的木板和茂密的植物。

Unfortunately motion blur did nothing to help the antialiasing of such slowly moving objects and FXAA added some nasty noise during movement, especially on grass. We didn’t really have time to try so-called “wire AA” and MSAA was out of our budgets so we decided to try using temporal antialiasing techniques.

不幸的是运动模糊对运动缓慢的对象的抗锯齿没有帮助，并且FXAA在运动的时候增加了一些令人不爽的噪声，尤其在草地上。我们真的没有时间去尝试所谓的“wire抗锯齿”，并且MSAA超出了我们的性能预算，所以我们决定尝试使用时间抗锯齿技术。

I would like to thank here especially Benjamin Goldstein, our Technical Lead with whom I had a great pleasure to work on trying and prototyping various temporal AA techniques very late in the production.

这里我要感谢Benjamin Goldstein，我们的技术Lead，我很愉快在开发后期尝试研究不同时间抗锯齿技术。

Assassin’s Creed 4 XboxOne / Playstation 4 AA《刺客信条4》XboxOne / Playstation 4 抗锯齿

As a first iteration, we started with single-frame variation of morphological SMAA by Jimenez et al. [2] In its even most basic settings it showed definitely better-quality alternative to FXAA (at a bit higher cost, but thanks to much bigger computing power of next-gen consoles it stayed in almost same budget compared to FXAA on current-gen consoles). There was less noise and artifacts and much better morphological edge reconstruction , but obviously it wasn’t able do anything to reconstruct all this subpixel detail.

作为第一次迭代，我们以Jimenez et al的形态学SMAA 单帧变体开始。在它的甚至大多数的基础设置上清楚的展示了比起FXAA更佳的质量的选择（略高一点的性能消耗，但是得益于次世代游戏机更强的计算能力，比起本世代游戏机的FXAA来说基本上是一样的预算消耗）。他有更少的噪声和瑕疵和更佳的形态学边缘重构，但是显然他不能够重建所有的次像素细节。

So the next step was to try to plug in temporal AA component. Couple hours of work and voila – we had much better AA. Just look at the following pictures.

所以下一步是尝试插入时间抗锯齿部分。数小时的工作后，瞧——我们有个更佳的抗锯齿表现。就像下面图片展示的那样。

Pretty amazing, huh?

确实令人惊讶？

Sure, but this was at first the result only for static image – and this is where your AA problems start (not end!).

不过这只是第一步静帧图像的结果——并且这是你的抗锯齿问题的开始。

Getting motion vectors right正确的获取运动向量

Ok, so we had some subtle and we thought “precise” motion blur, so getting motion vectors to allow proper reprojection for moving objects should be easy?

好的，所以我们有一些精巧的并且我们认为“精确”运动模糊，那么对于运动物体得到运动向量去允许正确的重映射应该很容易？

Well, it wasn’t. We were doing it right for most of the objects and motion blur was ok – you can’t really notice lack of motion blur or slightly wrong motion blur on some specific objects. However for temporal AA you need to have them proper and pixel-perfect for all of your objects!

然而并不是这样。我们对大多数的对象这样做是正确的并且运动模糊是可以接受的——在一些特定的对象你几乎不会注意到运动模糊缺失或轻微的运动模糊错误。然而对于时间抗锯齿，你的所有对象你都需要它们有正确且像素级别的完美。

Other way you will get huge ghosting. If you try to mask out this objects and not apply temporal AA on them at all, you will get visible jittering and shaking from sub-pixel camera position changes.

不然的话，你会得到巨大的重影。如果你试着去遮掩掉这个对象，并不应用时间抗锯齿在上面，你会得到来自次像素摄像机位置改变的明显的抖动和震动。

Let me list all the problems with motion vectors we have faced and some comments of whether we solved them or not:

让我列一下我们面对过的运动向量的所有问题和一些我们是否解决他们的体会：

Cloth and soft-body physical objects. From our physics simulation for cloth and soft bodies that was very fast and widely used in the game (characters, sails) we got full vertex information in world space. Object matrices were set to just identity. Therefore, such objects had zero motion vector (and only motion from camera was applied to them). We needed to extract such information from the engine and physics – fortunately it was relatively easy as it was used already for bounding box calculations. We fixed ghosting from moving soft body and cloth objects, but didn’t have motion vectors from the movement itself – we didn’t want to completely change the pipeline to GPU indirections and subtracting positions from two vertex buffers. It was ok-ish as they wouldn’t move very abruptly and we didn’t see artifacts from it.
布料和软体物理对象。布料和软体的物理模拟的在我们的游戏中的应用十分快速和广泛（角色，帆）我们得到完整的世界空间顶点信息。对象矩阵设置为单位化。因此，这个对象运动向量为0（并且只应用来自摄像机的运动）。我们需要从引擎和物理中去提取这个信息——幸运地是这相对比较简单因为这些信息之前已经用于包围盒的计算。我们修复移动软体和布料对象的残影问题，但是没有来自它自身运动的运动向量——我们完全不想间接的改变GPU管线并且减去来自两个顶点缓冲区的位置，这看起来还可以因为他们不会运动的十分突然，并且我们并没有看到瑕疵。
Some “custom” object types that had custom matrices and the fact we interpreted data incorrectly. Same situation as with cloth existed also for other dynamic objects. We got some custom motion vector debugging rendering mode working and fixing all those bugs was just matter of couple days in total.
一些有自定义矩阵的自定义的对象类型和我们解释数据错误的事实。和布料相同的情形也在一些其他的动态对象上存在。我们得到一些自定义的移动向量测试渲染模式工作并且修复这些bugs总共几天的事情。
Ocean. It was not writing to the G-buffer. Instead of seeing motion vectors of ocean surface, we had proper information, but for ocean floor or “sky” behind it (when with very deep ocean there was no bottom surface at all). The fix there was to overwrite some G-buffer information like depth and motion-vectors. However, still we didn’t store previous frame simulation results and didn’t try to use them, so in theory you could see some ghosting on big and fast waves during storm. It wasn’t very big problem for us and no testers ever reported it.
海洋。海洋不会写入到G-Buffer中。不是看见的海洋表面的运动向量，我们有正确的信息，而是对于海洋平面或者它后面的“天空”（当有非常深没有底面的海洋的时候）。修复它的方法是重写一些像深度和移动向量G-Buffer信息。然而，尽管如此我们没有存储过去的帧模拟结果并且不使用他们，所有在理论上在风暴期间你可以在大和快的波浪上看到一些重影。对于我们来说这不是个很大的问题，并且没有测试人员报告它。
Procedurally moving vegetation. We had some vertex noise based artist-authored vegetation movement and again, difference between two frame vertex position values wasn’t calculated to produce proper motion vectors. This is single biggest visible artifact in game from temporal AA technique and we simply didn’t have the time to modify our material shader compiler / generator and couldn’t apply any significant data changes in patch (we improved AA in our first patch). Proper solution here would be to automatically replicate all the artist created shader code that calculates output local vertex position if it relies on any input data that changes between frames like “time” or closest character entity position (this one was used to simulate collision with vegetation), pass it through interpolators (perspective correction!), subtract it and have proper motion vectors. Artifacts like over blurred leaves are sometimes visible in the final game and I’m not very proud of it – although maybe it is usual programmer obsession.
程序化的运动植被。再次，我们有一些基于艺术家实现的顶点噪波的植被运动，两个帧顶点位置值的差值没有计算提供正确的运动向量。这是游戏中使用时间抗锯齿技术一个最大的明显的瑕疵，并且我们没有时间来修改我们的材质着色器编译器/生成器，并且不能在补丁中应用任何重大数据修改（我们在我们的第一个补丁上提升了抗锯齿）。如果它依靠任何帧之间的类似“时间”或最近角色实体位置的改变（这用来模拟植被的碰撞）的输入数据，合适的解决方案是自动复制所有艺术家创建的着色器代码并计算输出局部顶点位置，通过插值器传递（透视校正！），减去它并且得到适合的运动向量。在最终的游戏中，像过度模糊树叶的偶尔看得见的瑕疵。我不为这感到骄傲——尽管可能这通常是程序员的沉迷。
Objects being teleported on skinning. We had some checks for entities and meshes being teleported, but in some single and custom cases objects were teleported using skinning – it would be impractical to analyze whole skeleton looking for temporal discontinuities. We asked gameplay and animations programmers to mark them on such a frame and quickly fixed all the remaining bugs.
通过蒙皮驱动的对象。我们有一些对于被驱动的实体和模型的检查，但是在一些单一且自定义的案例中对象通常使用蒙皮驱动——分析整个骨架查找时间间断点是不切实际的。我们要求gamplay和动画程序在这样的一个帧中去标记他们，并且快速的修复所有存在的bug。

Problems with motion vector based rejection algorithm基于拒绝算法运动向量的问题

Ok, we spend 1-2 weeks on fixing our motion vectors (and motion blur also got much better!