Web proceedings papers

Authors

Leonid Djinevski , Sime Arsenovski , Sasko Ristov and Marjan Gusev

Abstract

GPU devices offer great performance when dealing with algorithms that require intense computational resources. A developer can configure the L1 cache memory of the latest GPU Kepler architecture with different cache size and cache set associativity, per Streaming Multiprocessors (SM). The performance of the computation intensive algorithms can be affected by these cache parameters. In this paper, we evaluate influence of the performance for all possible configurations of L1 cache size and associativity, for dense matrix-matrix multiplication algorithm for various problem sizes. The results show a small impact of L1 cache memory for the overall performance of the algorithm.

Keywords

Cache Memory, SIMD, GPGPU.