1.对于一个标准的3*3 均值滤波,kernel代码如下:

使用buffer/image缓冲对象

for(i = x - k; i <= x + k; i++)
{
finalcolor = finalcolor + convert_uint4(inputImage[i + j * width]);
}
}

outputImage[x + y * width] = convert_uchar4(finalcolor/n);

}
for(i = x - k; i <= x + k; i++)
{
finalcolor = finalcolor + read_imageui(inputImage, imageSampler, (int2)(i,j));
}
}

finalcolor = finalcolor/n;

write_imageui(outputImage, (int2)(x,y), finalcolor);


}

对一个2048*2048的图像执行filter操作,

基于OpenCL的mean filter性能

基于OpenCL的mean filter性能

 

基于OpenCL的mean filter性能

基于OpenCL的mean filter性能

global work size = {2048, 2048, 1}, group work size = {16, 16}, 一般group work size应该为64的倍数,因为对于AMD显卡,wave是基本的硬件线程调度单位。

使用了6个GPRs,没有使用ScratchRegs,ScratchRregs是指用vedio meory来模拟GPR,但是线程执行的速度会大大降低,应尽量减少ScratchRegs的数量。

可以看到,使用image对象kernel执行时间要短,但奇怪的是各项性能参数都是buffer对象领先,除了alu busy和alu指令数目。

改为下面的kernel代码,性能会有所提高

 

/* k*k area */
uint4 finalcolor = (uint4)(0);

finalcolor = finalcolor + read_imageui(inputImage, imageSampler, (int2)(x-1,y-1));
finalcolor = finalcolor + read_imageui(inputImage, imageSampler, (int2)(x,y-1));
finalcolor = finalcolor + read_imageui(inputImage, imageSampler, (int2)(x+1,y-1));
finalcolor = finalcolor + read_imageui(inputImage, imageSampler, (int2)(x-1,y));
finalcolor = finalcolor + read_imageui(inputImage, imageSampler, (int2)(x,y));
finalcolor = finalcolor + read_imageui(inputImage, imageSampler, (int2)(x+1,y));
finalcolor = finalcolor + read_imageui(inputImage, imageSampler, (int2)(x-1,y+1));
finalcolor = finalcolor + read_imageui(inputImage, imageSampler, (int2)(x,y+1));
finalcolor = finalcolor + read_imageui(inputImage, imageSampler, (int2)(x+1,y+1));

finalcolor = finalcolor/9;

write_imageui(outputImage, (int2)(x,y), finalcolor);


}

基于OpenCL的mean filter性能

基于OpenCL的mean filter性能

基于OpenCL的mean filter性能

基于OpenCL的mean filter性能

相关文章:

  • 2022-12-23
  • 2022-12-23
  • 2022-12-23
  • 2022-02-16
  • 2021-10-02
  • 2021-05-31
  • 2021-04-09
  • 2021-09-01
猜你喜欢
  • 2021-09-25
  • 2022-12-23
  • 2022-12-23
  • 2022-12-23
  • 2021-10-11
  • 2022-12-23
  • 2021-11-17
相关资源
相似解决方案