On Wednesday, September 25, 2020, Ricardo Luis Pieper (GMAP master’s student) was approved with distinction in defense of his master’s thesis at the School of Technology of the Pontifical Catholic University of Rio Grande do Sul (PUCRS). The master’s thesis was advised by Dr. Luiz Gustavo Fernandes and Dr. Dalvan Griebler (PNPD/PUCRS). The members of the examining board were Dr. Fernando Luís Dotti (PPGCC/PUCRS) and Dr. Marco Danelutto (Univ. of Pisa). The defense was done through video conference due to the events of Covid-19.
Ricardo tells how the master’s course experience was:
Regarding the thesis, It was challenging as expected, particularly testing, benchmarks, and writing the thesis itself. I feel lucky to have Luiz Gustavo and Dalvan as advisors. I am convinced that Dalvan is the most patient man in the entire planet, as he had to deal with my mistakes, as well as motivate me to get through it, all while having to do the same for many other students. Having Dalvan’s moral support was crucial to finishing it. Presenting the thesis in English was interesting, and not as difficult as I thought it would be. During the master’s degree, I learned things that I would probably have never touched in my day job, including data visualization, computer graphics, image processing, and of course, stream processing paradigms. Now that I’ve finished it, I can look back to all that work and realize it was worth it.
Ricardo Luis Pieper
The following is the title and abstract of the master thesis:
Title: High-level Programming Abstractions for Distributed Stream Processing
Abstract:
Stream processing applications represent a significant part of today’s software. An increased amount of streaming data is generated every day from various sources (computing devices and applications), which requires to be processed on time. Shared-memory architectures cannot cope with these large-scale processing demands. In High-Performance Computing (HPC), Message Passing Interface (MPI) is the state-of-the-art parallel API (Application Programming Interface) for implementing parallel C/C++ programs. However, the stream parallelism exploitation using MPI is difficult and error-prone to application developers because it exposes low-level details to them, regarding computer architectures and operating systems. Programmers have to deal with implementation mechanisms for data serialization, process communication and synchronization, fault tolerance, work scheduling, load balancing, and parallelism strategies. Our research work addresses a subset of these challenges and problems providing two high-level programming abstractions for distributed stream processing. First, we created a distributed stream parallelism library called DSPAR-LIB. It was built as a skeleton library equipped with Farm and Pipeline parallel patterns to provide programming abstractions on top of MPI. Second, we extend the SPar language and compiler roles to support distributed memory architectures since it is a Domain-Specific Language (DSL) for expressing stream parallelism using C++11 annotation that has been proved to be productive on shared-memory architectures. We managed to make it work without significantly changing the easy of use language syntax and semantics, generating automatic parallel code with SPar’s compiler using DSParlib as the parallel runtime. The experiments were conducted using real-world stream processing applications and testing different cluster configurations. We demonstrated that DSParlib provides a simpler API than MPI and competitive performance. Also, the SPar’s compiler was able to generate parallel code automatically without performance penalties compared to handwritten codes in DSPARLIB. Finally, with all these high-level programming abstractions implemented, SPAR becomes the first annotation-based language for expressing stream parallelism in C++ programs to support distributed-memory architectures, avoiding significant sequential code refactoring to enable parallel execution on clusters.