On April 4th, Gabriell Alves de Araujo successfully defended his master’s thesis “Data and Stream Parallelism Optimization on GPUs” at the School of Technology of the Pontifical Catholic University of Rio Grande do Sul (PUCRS). Gabriell is currently a Ph.D. candidate at GMAP. The defense was done through video conference due to the events of Covid-19.
Nowadays, most computers are equipped with Graphics Processing Units (GPUs) to provide massive-scale parallelism at a low cost. Parallel programming is necessary to fully exploit this architectural capacity. However, it represents a challenge for programmers since it requires refactoring algorithms, designing parallelism techniques, and hardware-specific knowledge. Moreover, GPU parallelism is even more challenging since GPUs have peculiar hardware characteristics and employ a parallelism paradigm called many-core programming. In this sense, parallel computing research has focused on the study of efficient programming techniques for GPUs and the development of abstractions that reduce the effort when writing parallel code. SPar is a domain-specific language (DSL) that goes in this direction. It can be used to express stream parallelism in a simpler way without significantly impacting performance. SPar offers high-level abstractions via code annotations while the SPar compiler generates parallel code. SPar recently received an extension to allow parallel code generation for CPUs and GPUs in stream applications. The CPU cores control the flow of data in the generated code. At the same time, the GPU applies massive parallelism in the computation of each stream element. To this end, SPar generates code for an intermediate library called GSParLib. It is a pattern-oriented parallel API that provides a unified programming model targeting CUDA and OpenCL runtime, allowing parallelism exploitation of different vendor GPUs. However, the GPU support for both SPar and GSParLib is still in its initial steps; only basic features are provided, and no studies have comprehensively evaluated SPar and GSParLibÂ ́s performance. This work contributes by parallelizing representative high-performance computing (HPC) benchmarks and implementing new features and optimizations for GPUs. Our set of improvements covers most of the critical limitations of GSParLib regarding performance and programmability. In our experiments, the optimized version of GSParLib was able to achieve up to 54,500.00% of speedup improvement over the original version of GSParLib on data parallelism benchmarks and up to 718,43% of throughput improvement on stream parallelism benchmarks.
Prof. Ph.D. Horacio Gonzalez-Velez (Cloud Competency Centre/NCI)
Prof. Ph.D. Tiago Coelho Ferreto(PUCRS)
Prof. Ph.D. Luiz Gustavo Fernandes (PUCRS – Advisor)
Prof. Ph.D. Dalvan Griebler (PUCRS – Co-Advisor)
By: Gabriella Andrade