Raytracing On OpenGL Compute Shader

Compute Shader GLSL Variables

uvec3 gl_NumWorkGroups    global work group size we gave to glDispatchCompute()
uvec3 gl_WorkGroupSize    local work group size we defined with layout
uvec3 gl_WorkGroupID    position of current invocation in global work group
uvec3 gl_LocalInvocationID    position of current invocation in local work group
uvec3 gl_GlobalInvocationID    unique index of current invocation in global work group
uint gl_LocalInvocationIndex    1d index representation of gl_LocalInvocationID

Execution:

执行渲染是：一个texture到full-screen quad，当然是要用个矩形绘制填充NDC

Creating Texture/Image创建纹理：

创建32位图，最后一句话 OpenGL treats "image units" slightly differently to textures, so we call a glBindImageTexture() function to make this link. Note that we can set this to "write only".

这个贴图单元与普通得textures稍微不一样，用最后一句函数可以让图片写入。

// dimensions of the image
int tex_w = 512, tex_h = 512;
GLuint tex_output;
glGenTextures(1, &tex_output);
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, tex_output);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F, tex_w, tex_h, 0, GL_RGBA, GL_FLOAT,
 NULL);
glBindImageTexture(0, tex_output, 0, GL_FALSE, 0, GL_WRITE_ONLY, GL_RGBA32F);

Determining the Work Group Size线程开辟

glDispatchCompute()函数可以决定我们在compute shader invocations定义计算量。首先得到最大size of the total work group

通过以下函数得到在x,y,z上 total work group 范围:

int work_grp_cnt[3];

glGetIntegeri_v(GL_MAX_COMPUTE_WORK_GROUP_COUNT, 0, &work_grp_cnt[0]);
glGetIntegeri_v(GL_MAX_COMPUTE_WORK_GROUP_COUNT, 1, &work_grp_cnt[1]);
glGetIntegeri_v(GL_MAX_COMPUTE_WORK_GROUP_COUNT, 2, &work_grp_cnt[2]);

printf("max global (total) work group counts x:%i y:%i z:%i\n",
  work_grp_cnt[0], work_grp_cnt[1], work_grp_cnt[2]);

得到最大支持 local work group 大小(sub-division of the total number of jobs总任务局部细分),这个是着色器内部定义的，用关键字layout。

int work_grp_size[3];

glGetIntegeri_v(GL_MAX_COMPUTE_WORK_GROUP_SIZE, 0, &work_grp_size[0]);
glGetIntegeri_v(GL_MAX_COMPUTE_WORK_GROUP_SIZE, 1, &work_grp_size[1]);
glGetIntegeri_v(GL_MAX_COMPUTE_WORK_GROUP_SIZE, 2, &work_grp_size[2]);

printf("max local (in one shader) work group sizes x:%i y:%i z:%i\n",
  work_grp_size[0], work_grp_size[1], work_grp_size[2]);

更进一步：在compute shader的local work group 最大工作单元(work group units)是如果我们如果在one local work group 执行32X32的工作任务，意味着不能超过这个32*32 = 1024这个值。

local work group size是根据设备来的。用合理的限制，让用户合理的调整 local work group 的大小可以获取更好的性能。

我的本机输出：

max global (total) work group size x:2147483647 y:65535 z:65535
max local (in one shader) work group sizes x:1024 y:1024 z:64
max computer shader invocations 1024

从简单的设置开始：

------把全局工作组大小(Global work group size) 设置与贴图一样大512*512

------局部工作组大小(Local work group size) 设置成1个像素 1*1

------设置Z大小为1

编写最基础的Compute shader

#version 450
layout(local_size_x = 1, local_size_y = 1) in;
layout(rgba32f, binding = 0) uniform image2D img_output;

第一行的layout是定义local work group 大小。这个是后台决定，如果要调整 local work group 更大点的话，我们不用改材质。

这里我们决定用1个像素用1*1，如果工作组要支持 1d or 3d 需要改结构。

第二行的layout是图片设置，注意不是uniform sampler XXX，而是 uniform image2D XXX

现在开始设置一个黑色shader:

void main() {
  // base pixel colour for image
  vec4 pixel = vec4(0.0, 0.0, 0.0, 1.0);
  // get index in global work group i.e x,y position
  ivec2 pixel_coords = ivec2(gl_GlobalInvocationID.xy);
  
  //
  // interesting stuff happens here later
//
  // output to a specific pixel in the image
  imageStore(img_output, pixel_coords, pixel);
}

第二行坐标是用:GLSL内建变量 gl_GlobalInvocationID.xy定位调用在工作组的坐标。

编译材质用这个类型：GL_COMPUTE_SHADER：

GLuint ray_shader = glCreateShader(GL_COMPUTE_SHADER);
glShaderSource(ray_shader, 1, &the_ray_shader_string, NULL);
glCompileShader(ray_shader);
// check for compilation errors as per normal here

GLuint ray_program = glCreateProgram();
glAttachShader(ray_program, ray_shader);
glLinkProgram(ray_program);
// check for linking errors and validate program as per normal here

我们当然还要渲染图片到最后的quad上，这样就可以读取我们这个 image2d贴图

Dispatch the shaders 渲染循环中执行材质:

第一步先tm的渲染compute shader texture，把z轴设置为1：

glUseProgram(ray_program);
glDispatchCompute((GLuint)tex_w, (GLuint)tex_h, 1);

// make sure writing to image has finished before read
glMemoryBarrier(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT);

To make sure that the compute shaders have completely finished writing to the image before we start sampling, we put in a memory barrier with glMemoryBarrier() and the image access bit 。

为了保证图片完成之后采样，所以用了个glMemoryBarrier(),也可以You can instead use GL_ALL_BARRIER_BITS to be on the safe side for all types of writing

第二步，正常绘制到quad上：

// normal drawing pass
glClear(GL_COLOR_BUFFER_BIT);
glUseProgram(quad_program);
glBindVertexArray(quad_vao);
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, tex_output);
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);

我把贴图绘制设置成绿色：
Raytracing On OpenGL Compute Shader

NEXT:Raytracing

一开始是将scene直接hard-code模式写入我们compute shader。

Ray定义：origin和direction

我们想分布ray's origin 到我们像素，作者定义为NDC:-5.0到5.0 X和Y都在这个区间

float max_x = 5.0;
float max_y = 5.0;
ivec2 dims = imageSize(img_output); // fetch image dimensions
float x = (float(pixel_coords.x * 2 - dims.x) / dims.x);
float y = (float(pixel_coords.y * 2 - dims.y) / dims.y);
vec3 ray_o = vec3(x * max_x, y * max_y, 0.0);
vec3 ray_d = vec3(0.0, 0.0, -1.0); // ortho

在这里我不映射 -5到5 ，直接就按照一个像素 1 的大小。整个图片中心为(0，0）点。然后每个像素的中心为光线中心。

所以应该应用:

#version 450 core
layout(local_size_x = 1, local_size_y = 1) in;
layout(rgba32f, binding = 0) uniform image2D img_output;

void main() {
    const float pixel_size = 1.0f;
    ivec2 texsize = imageSize(img_output);  // get current texture size , 500 * 500

    // base pixel colour for image
    vec4 pixel = vec4(0.0, 0.0, 0.0, 1.0);
    // get index in global work group i.e x,y position
    ivec2 pixel_coords = ivec2(gl_GlobalInvocationID.xy);
    pixel.r = pixel_size * (float(pixel_coords.x) - 0.5f* float(texsize.x-1.0));
    pixel.g = pixel_size * (float(pixel_coords.y) - 0.5f* float(texsize.y-1.0));
    // output to a specific pixel in the image
    imageStore(img_output, pixel_coords, pixel);
}

此时范围在xy属于-250 到 249。如果在这个基础上做多重抖动采样可以刚好满足 -250 到 250：

抖动采样伪代码：

 // for samples
            for (int i = 0; i < vp.num_samples; i++) {
                sample_point = vp.sampler_ptr->sample_unit_square();
                pixel_pos[0] = vp.size * (c - 0.5*vp.hres + sample_point[0]);
                pixel_pos[1] = vp.size * (r - 0.5*vp.vres + sample_point[1]);
                ray.o = RT_VEC_3D({ pixel_pos[0],pixel_pos[1],RT_SCALAR(zw) });
                pixel_color += tracer_ptr->trace_ray(ray);
            }
            pixel_color /= vp.num_samples;

View Code