Compute Shader GLSL Variables
uvec3 gl_NumWorkGroups global work group size we gave to glDispatchCompute() uvec3 gl_WorkGroupSize local work group size we defined with layout uvec3 gl_WorkGroupID position of current invocation in global work group uvec3 gl_LocalInvocationID position of current invocation in local work group uvec3 gl_GlobalInvocationID unique index of current invocation in global work group uint gl_LocalInvocationIndex 1d index representation of gl_LocalInvocationID
Execution:
执行渲染是:一个texture到full-screen quad,当然是要用个矩形绘制填充NDC
Creating Texture/Image创建纹理:
创建32位图,最后一句话 OpenGL treats "image units" slightly differently to textures, so we call a glBindImageTexture() function to make this link. Note that we can set this to "write only".
这个贴图单元与普通得textures稍微不一样,用最后一句函数可以让 图片写入。
// dimensions of the image int tex_w = 512, tex_h = 512; GLuint tex_output; glGenTextures(1, &tex_output); glActiveTexture(GL_TEXTURE0); glBindTexture(GL_TEXTURE_2D, tex_output); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR); glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F, tex_w, tex_h, 0, GL_RGBA, GL_FLOAT, NULL); glBindImageTexture(0, tex_output, 0, GL_FALSE, 0, GL_WRITE_ONLY, GL_RGBA32F);
Determining the Work Group Size线程开辟
glDispatchCompute()函数可以决定 我们在compute shader invocations定义计算量。首先得到最大size of the total work group
通过以下函数得到在x,y,z上 total work group 范围:
int work_grp_cnt[3]; glGetIntegeri_v(GL_MAX_COMPUTE_WORK_GROUP_COUNT, 0, &work_grp_cnt[0]); glGetIntegeri_v(GL_MAX_COMPUTE_WORK_GROUP_COUNT, 1, &work_grp_cnt[1]); glGetIntegeri_v(GL_MAX_COMPUTE_WORK_GROUP_COUNT, 2, &work_grp_cnt[2]); printf("max global (total) work group counts x:%i y:%i z:%i\n", work_grp_cnt[0], work_grp_cnt[1], work_grp_cnt[2]);
得到最大支持 local work group 大小(sub-division of the total number of jobs总任务局部细分),这个是着色器内部定义的,用关键字layout。
int work_grp_size[3]; glGetIntegeri_v(GL_MAX_COMPUTE_WORK_GROUP_SIZE, 0, &work_grp_size[0]); glGetIntegeri_v(GL_MAX_COMPUTE_WORK_GROUP_SIZE, 1, &work_grp_size[1]); glGetIntegeri_v(GL_MAX_COMPUTE_WORK_GROUP_SIZE, 2, &work_grp_size[2]); printf("max local (in one shader) work group sizes x:%i y:%i z:%i\n", work_grp_size[0], work_grp_size[1], work_grp_size[2]);
更进一步:在compute shader的local work group 最大工作单元(work group units)是 如果我们如果在one local work group 执行32X32的工作任务,意味着不能超过这个32*32 = 1024这个值。
local work group size是根据设备来的。用合理的限制,让用户合理的调整 local work group 的大小可以获取更好的性能。
我的本机输出:
max global (total) work group size x:2147483647 y:65535 z:65535 max local (in one shader) work group sizes x:1024 y:1024 z:64 max computer shader invocations 1024
从简单的设置开始:
------把全局工作组大小(Global work group size) 设置与贴图一样大512*512
------局部工作组大小(Local work group size) 设置成1个像素 1*1
------设置Z大小为1
编写最基础的Compute shader
#version 450 layout(local_size_x = 1, local_size_y = 1) in; layout(rgba32f, binding = 0) uniform image2D img_output;
第一行的layout是定义local work group 大小。这个是后台决定,如果要调整 local work group 更大点的话,我们不用改材质。
这里我们决定用1个像素用1*1,如果工作组要支持 1d or 3d 需要改结构。
第二行的layout是图片设置,注意不是uniform sampler XXX,而是 uniform image2D XXX
现在开始设置一个黑色shader:
void main() { // base pixel colour for image vec4 pixel = vec4(0.0, 0.0, 0.0, 1.0); // get index in global work group i.e x,y position ivec2 pixel_coords = ivec2(gl_GlobalInvocationID.xy); // // interesting stuff happens here later
// // output to a specific pixel in the image imageStore(img_output, pixel_coords, pixel); }
第二行坐标是用:GLSL内建变量 gl_GlobalInvocationID.xy定位调用在工作组的坐标。
编译材质用这个类型:GL_COMPUTE_SHADER:
GLuint ray_shader = glCreateShader(GL_COMPUTE_SHADER); glShaderSource(ray_shader, 1, &the_ray_shader_string, NULL); glCompileShader(ray_shader); // check for compilation errors as per normal here GLuint ray_program = glCreateProgram(); glAttachShader(ray_program, ray_shader); glLinkProgram(ray_program); // check for linking errors and validate program as per normal here
我们当然还要渲染图片到最后的quad上,这样就可以读取我们这个 image2d贴图
Dispatch the shaders 渲染循环中执行材质:
第一步先tm的渲染compute shader texture,把z轴设置为1:
glUseProgram(ray_program); glDispatchCompute((GLuint)tex_w, (GLuint)tex_h, 1);
// make sure writing to image has finished before read glMemoryBarrier(GL_SHADER_IMAGE_ACCESS_BARRIER_BIT);
To make sure that the compute shaders have completely finished writing to the image before we start sampling, we put in a memory barrier with glMemoryBarrier() and the image access bit 。
为了保证图片完成之后采样,所以用了个glMemoryBarrier(),也可以You can instead use GL_ALL_BARRIER_BITS to be on the safe side for all types of writing
第二步,正常绘制到quad上:
// normal drawing pass glClear(GL_COLOR_BUFFER_BIT); glUseProgram(quad_program); glBindVertexArray(quad_vao); glActiveTexture(GL_TEXTURE0); glBindTexture(GL_TEXTURE_2D, tex_output); glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
我把贴图绘制设置成绿色:
NEXT:Raytracing
一开始是将scene直接hard-code模式写入我们compute shader。
Ray定义:origin和direction
我们想分布ray's origin 到我们像素,作者定义为NDC:-5.0到5.0 X和Y都在这个区间
float max_x = 5.0; float max_y = 5.0; ivec2 dims = imageSize(img_output); // fetch image dimensions float x = (float(pixel_coords.x * 2 - dims.x) / dims.x); float y = (float(pixel_coords.y * 2 - dims.y) / dims.y); vec3 ray_o = vec3(x * max_x, y * max_y, 0.0); vec3 ray_d = vec3(0.0, 0.0, -1.0); // ortho
在这里我不映射 -5到5 ,直接就按照一个像素 1 的大小。 整个图片中心为(0,0)点。然后每个像素的中心为光线中心。
所以应该应用:
#version 450 core layout(local_size_x = 1, local_size_y = 1) in; layout(rgba32f, binding = 0) uniform image2D img_output; void main() { const float pixel_size = 1.0f; ivec2 texsize = imageSize(img_output); // get current texture size , 500 * 500 // base pixel colour for image vec4 pixel = vec4(0.0, 0.0, 0.0, 1.0); // get index in global work group i.e x,y position ivec2 pixel_coords = ivec2(gl_GlobalInvocationID.xy); pixel.r = pixel_size * (float(pixel_coords.x) - 0.5f* float(texsize.x-1.0)); pixel.g = pixel_size * (float(pixel_coords.y) - 0.5f* float(texsize.y-1.0)); // output to a specific pixel in the image imageStore(img_output, pixel_coords, pixel); }
此时范围在xy属于-250 到 249。 如果在这个基础上做多重抖动采样可以刚好 满足 -250 到 250:
抖动采样伪代码:
// for samples for (int i = 0; i < vp.num_samples; i++) { sample_point = vp.sampler_ptr->sample_unit_square(); pixel_pos[0] = vp.size * (c - 0.5*vp.hres + sample_point[0]); pixel_pos[1] = vp.size * (r - 0.5*vp.vres + sample_point[1]); ray.o = RT_VEC_3D({ pixel_pos[0],pixel_pos[1],RT_SCALAR(zw) }); pixel_color += tracer_ptr->trace_ray(ray); } pixel_color /= vp.num_samples;