Quasar > Samples > Convolution

Convolution

overview

Example that shows the different options to implement a 3x3 filter: 1) Non-separable implementation 2) Separable implementation, two stages 3) Separable implementation, one stage, using shared memory.
In general, the approach using shared memory is the fastest (up to 30% faster on most GPUs). Note that the out of bounds checking compilation option needs to be turned off in order to have this benefit.
Also important is that the upper bounds for using shared memory are specified. This can be done using the assertion system. The compiler is then able to compute the maximal amount of shared memory that will be needed by the kernel function.

See filter3x3_demo.q