2021 |
Löff, Júnior; Hoffmann, Renato Barreto; Griebler, Dalvan; Fernandes, Luiz G High-Level Stream and Data Parallelism in C++ for Multi-Cores Inproceedings XXV Brazilian Symposium on Programming Languages (SBLP), ACM, Joinville, Brazil, 2021. @inproceedings{LOFF:SBLP:21, title = {High-Level Stream and Data Parallelism in C++ for Multi-Cores}, author = {Júnior Löff and Renato Barreto Hoffmann and Dalvan Griebler and Luiz G Fernandes}, year = {2021}, date = {2021-10-01}, booktitle = {XXV Brazilian Symposium on Programming Languages (SBLP)}, publisher = {ACM}, address = {Joinville, Brazil}, series = {SBLP'21}, abstract = {Stream processing applications have seen an increasing demand with the increased availability of sensors, IoT devices, and user data. Modern systems can generate millions of data items per day that require to be processed timely. To deal with this demand, application programmers must consider parallelism to exploit the maximum performance of the underlying hardware resources. However, parallel programming is often difficult and error-prone, because programmers must deal with low-level system and architecture details. In this work, we introduce a new strategy for automatic data-parallel code generation in C++ targeting multi-core architectures. This strategy was integrated with an annotation-based parallel programming abstraction named SPar. We have increased SPar’s expressiveness for supporting stream and data parallelism, and their arbitrary composition. Therefore, we added two new attributes to its language and improved the compiler parallel code generation. We conducted a set of experiments on different stream and data-parallel applications to assess the efficiency of our solution. The results showed that the new SPar version obtained similar performance with respect to handwritten parallelizations. Moreover, the new SPar version is able to achieve up to 74.9x better performance with respect to the original ones due to this work.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Stream processing applications have seen an increasing demand with the increased availability of sensors, IoT devices, and user data. Modern systems can generate millions of data items per day that require to be processed timely. To deal with this demand, application programmers must consider parallelism to exploit the maximum performance of the underlying hardware resources. However, parallel programming is often difficult and error-prone, because programmers must deal with low-level system and architecture details. In this work, we introduce a new strategy for automatic data-parallel code generation in C++ targeting multi-core architectures. This strategy was integrated with an annotation-based parallel programming abstraction named SPar. We have increased SPar’s expressiveness for supporting stream and data parallelism, and their arbitrary composition. Therefore, we added two new attributes to its language and improved the compiler parallel code generation. We conducted a set of experiments on different stream and data-parallel applications to assess the efficiency of our solution. The results showed that the new SPar version obtained similar performance with respect to handwritten parallelizations. Moreover, the new SPar version is able to achieve up to 74.9x better performance with respect to the original ones due to this work. |
Andrade, Gabriella; Griebler, Dalvan; Santos, Rodrigo; Danelutto, Marco; Fernandes, Luiz Gustavo Assessing Coding Metrics for Parallel Programming of Stream Processing Programs on Multi-cores Inproceedings doi 2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), pp. 291-295, IEEE, Pavia, Italy, 2021, ISBN: 978-1-6654-2705-0. @inproceedings{ANDRADE:SEAA:21, title = {Assessing Coding Metrics for Parallel Programming of Stream Processing Programs on Multi-cores}, author = {Gabriella Andrade and Dalvan Griebler and Rodrigo Santos and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1109/SEAA53835.2021.00044}, doi = {10.1109/SEAA53835.2021.00044}, isbn = {978-1-6654-2705-0}, year = {2021}, date = {2021-09-01}, booktitle = {2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)}, pages = {291-295}, publisher = {IEEE}, address = {Pavia, Italy}, series = {SEAA'21}, abstract = {From the popularization of multi-core architectures, several parallel APIs have emerged, helping to abstract the programming complexity and increasing productivity in application development. Unfortunately, only a few research efforts in this direction managed to show the usability pay-back of the programming abstraction created, because it is not easy and poses many challenges for conducting empirical software engineering. We believe that coding metrics commonly used in software engineering code measurements can give useful indicators on the programming effort of parallel applications and APIs. These metrics were designed for general purposes without considering the evaluation of applications from a specific domain. In this study, we aim to evaluate the feasibility of seven coding metrics to be used in the parallel programming domain. To do so, five stream processing applications implemented with different parallel APIs for multi-cores were considered. Our experiments have shown COCOMO II is a suitable model for evaluating the productivity of different parallel APIs targeting multi-cores on stream processing applications while other metrics are restricted to the code size.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } From the popularization of multi-core architectures, several parallel APIs have emerged, helping to abstract the programming complexity and increasing productivity in application development. Unfortunately, only a few research efforts in this direction managed to show the usability pay-back of the programming abstraction created, because it is not easy and poses many challenges for conducting empirical software engineering. We believe that coding metrics commonly used in software engineering code measurements can give useful indicators on the programming effort of parallel applications and APIs. These metrics were designed for general purposes without considering the evaluation of applications from a specific domain. In this study, we aim to evaluate the feasibility of seven coding metrics to be used in the parallel programming domain. To do so, five stream processing applications implemented with different parallel APIs for multi-cores were considered. Our experiments have shown COCOMO II is a suitable model for evaluating the productivity of different parallel APIs targeting multi-cores on stream processing applications while other metrics are restricted to the code size. |
Pieper, Ricardo; Löff, Júnior; Hoffmann, Renato Berreto; Griebler, Dalvan; Fernandes, Luiz Gustavo High-level and Efficient Structured Stream Parallelism for Rust on Multi-cores Journal Article Journal of Computer Languages, na (na), pp. na, 2021. @article{PIEPER:COLA:21, title = {High-level and Efficient Structured Stream Parallelism for Rust on Multi-cores}, author = {Ricardo Pieper and Júnior Löff and Renato Berreto Hoffmann and Dalvan Griebler and Luiz Gustavo Fernandes}, year = {2021}, date = {2021-07-01}, journal = {Journal of Computer Languages}, volume = {na}, number = {na}, pages = {na}, publisher = {Elsevier}, abstract = {This work aims at contributing with a structured parallel programming abstraction for Rust in order to provide ready-to-use parallel patterns that abstract low-level and architecture-dependent details from application programmers. We focus on stream processing applications running on shared-memory multi-core architectures (i.e, video processing, compression, and others). Therefore, we provide a new high-level and efficient parallel programming abstraction for expressing stream parallelism, named Rust-SSP. We also created a new stream benchmark suite for Rust that represents real-world scenarios and has different application characteristics and workloads. Our benchmark suite is an initiative to assess existing parallelism abstraction for this domain, as parallel implementations using these abstractions were provided. The results revealed that Rust-SSP achieved up to 41.1% better performance than other solutions. In terms of programmability, the results revealed that Rust-SSP requires the smallest number of extra lines of code to enable stream parallelism..}, keywords = {}, pubstate = {published}, tppubtype = {article} } This work aims at contributing with a structured parallel programming abstraction for Rust in order to provide ready-to-use parallel patterns that abstract low-level and architecture-dependent details from application programmers. We focus on stream processing applications running on shared-memory multi-core architectures (i.e, video processing, compression, and others). Therefore, we provide a new high-level and efficient parallel programming abstraction for expressing stream parallelism, named Rust-SSP. We also created a new stream benchmark suite for Rust that represents real-world scenarios and has different application characteristics and workloads. Our benchmark suite is an initiative to assess existing parallelism abstraction for this domain, as parallel implementations using these abstractions were provided. The results revealed that Rust-SSP achieved up to 41.1% better performance than other solutions. In terms of programmability, the results revealed that Rust-SSP requires the smallest number of extra lines of code to enable stream parallelism.. |
Löff, Júnior; Griebler, Dalvan; Mencagli, Gabriele; Araujo, Gabriell; Torquati, Massimo; Danelutto, Marco; Fernandes, Luiz Gustavo The NAS parallel benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures Journal Article doi Future Generation Computer Systems, na (na), pp. na, 2021. @article{LOFF:FGCS:21, title = {The NAS parallel benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures}, author = {Júnior Löff and Dalvan Griebler and Gabriele Mencagli and Gabriell Araujo and Massimo Torquati and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1016/j.future.2021.07.021}, doi = {10.1016/j.future.2021.07.021}, year = {2021}, date = {2021-07-01}, journal = {Future Generation Computer Systems}, volume = {na}, number = {na}, pages = {na}, publisher = {Elsevier}, abstract = {The NAS Parallel Benchmarks (NPB), originally implemented mostly in Fortran, is a consolidated suite containing several benchmarks extracted from Computational Fluid Dynamics (CFD) models. The benchmark suite has important characteristics such as intensive memory communications, complex data dependencies, different memory access patterns, and hardware components/sub-systems overload. Parallel programming APIs, libraries, and frameworks that are written in C++ as well as new optimizations and parallel processing techniques can benefit if NPB is made fully available in this programming language. In this paper we present NPB-CPP, a fully C++ translated version of NPB consisting of all the NPB kernels and pseudo-applications developed using OpenMP, Intel TBB, and FastFlow parallel frameworks for multicores. The design of NPB-CPP leverages the Structured Parallel Programming methodology (essentially based on parallel design patterns). We show the structure of each benchmark application in terms of composition of few patterns (notably Map and MapReduce constructs) provided by the selected C++ frameworks. The experimental evaluation shows the accuracy of NPB-CPP with respect to the original NPB source code. Furthermore, we carefully evaluate the parallel performance on three multi-core systems (Intel, IBM Power and AMD) with different C++ compilers (gcc, icc and clang) by discussing the performance differences in order to give to the researchers useful insights to choose the best parallel programming framework for a given type of problem.}, keywords = {}, pubstate = {published}, tppubtype = {article} } The NAS Parallel Benchmarks (NPB), originally implemented mostly in Fortran, is a consolidated suite containing several benchmarks extracted from Computational Fluid Dynamics (CFD) models. The benchmark suite has important characteristics such as intensive memory communications, complex data dependencies, different memory access patterns, and hardware components/sub-systems overload. Parallel programming APIs, libraries, and frameworks that are written in C++ as well as new optimizations and parallel processing techniques can benefit if NPB is made fully available in this programming language. In this paper we present NPB-CPP, a fully C++ translated version of NPB consisting of all the NPB kernels and pseudo-applications developed using OpenMP, Intel TBB, and FastFlow parallel frameworks for multicores. The design of NPB-CPP leverages the Structured Parallel Programming methodology (essentially based on parallel design patterns). We show the structure of each benchmark application in terms of composition of few patterns (notably Map and MapReduce constructs) provided by the selected C++ frameworks. The experimental evaluation shows the accuracy of NPB-CPP with respect to the original NPB source code. Furthermore, we carefully evaluate the parallel performance on three multi-core systems (Intel, IBM Power and AMD) with different C++ compilers (gcc, icc and clang) by discussing the performance differences in order to give to the researchers useful insights to choose the best parallel programming framework for a given type of problem. |
Löff, Júnior; Griebler, Dalvan; Fernandes, Luiz Gustavo Melhorando a Geração Automática de Código Paralelo para o Paradigma de Processamento de Stream em Multi-cores Journal Article Revista Eletrônica de Iniciação Científica em Computação, 19 (2), pp. 2083, 2021. @article{LOFF:REIC:21, title = {Melhorando a Geração Automática de Código Paralelo para o Paradigma de Processamento de Stream em Multi-cores}, author = {Júnior Löff and Dalvan Griebler and Luiz Gustavo Fernandes}, url = {https://sol.sbc.org.br/journals/index.php/reic/article/view/2083}, year = {2021}, date = {2021-06-01}, journal = {Revista Eletrônica de Iniciação Científica em Computação}, volume = {19}, number = {2}, pages = {2083}, publisher = {Sociedade Brasileira de Computação (SBC)}, address = {Porto Alegre}, abstract = {A programação paralela ainda é um desafio para desenvolvedores, pois exibe demasiados detalhes de baixo nível e de sistemas operacionais. Programadores precisam lidar com detalhes como escalonamento, balanceamento de carga e sincronizações. Esse trabalho contribui com otimizações para uma abstração de programação paralela para expressar paralelismo de stream em multi-cores. O trabalho estendeu a SPar adicionando dois novos atributos na sua linguagem, e implementou melhorias no seu compilador a fim de proporcionar melhor desempenho ao código paralelo gerado automaticamente. Os experimentos revelaram que a nova versão da SPar consegue abstrair detalhes do paralelismo com desempenho similar às versões paralelizadas manualmente.}, keywords = {}, pubstate = {published}, tppubtype = {article} } A programação paralela ainda é um desafio para desenvolvedores, pois exibe demasiados detalhes de baixo nível e de sistemas operacionais. Programadores precisam lidar com detalhes como escalonamento, balanceamento de carga e sincronizações. Esse trabalho contribui com otimizações para uma abstração de programação paralela para expressar paralelismo de stream em multi-cores. O trabalho estendeu a SPar adicionando dois novos atributos na sua linguagem, e implementou melhorias no seu compilador a fim de proporcionar melhor desempenho ao código paralelo gerado automaticamente. Os experimentos revelaram que a nova versão da SPar consegue abstrair detalhes do paralelismo com desempenho similar às versões paralelizadas manualmente. |
Parallel Applications Modelling Group
GMAP is a research group at the Pontifical Catholic University of Rio Grande do Sul (PUCRS). Historically, the group has conducted several types of research on modeling and adapting robust, real-world applications from different domains (physics, mathematics, geology, image processing, biology, aerospace, and many others) to run efficiently on High-Performance Computing (HPC) architectures, such as Clusters.
In the last decade, new abstractions of parallelism are being created through domain-specific languages (DSLs), libraries, and frameworks for the next generation of computer algorithms and architectures, such as embedded hardware and servers with accelerators like Graphics Processing Units (GPUs) or Field-Programmable Gate Array (FPGAs). This has been applied to stream processing and data science-oriented applications. Concomitantly, since 2018, research is being conducted using artificial intelligence to optimize applications in the areas of Medicine, Ecology, Industry, Agriculture, Education, Smart Cities, and others.
Research Lines
Applied Data Science
Parallelism Abstractions
The research line HSPA (High-level and Structured Parallelism Abstractions) aims to create programming interfaces for the user/programmer who is not able in dealing with the parallel programming paradigm. The idea is to offer a higher level of abstraction, where the performance of applications is not compromised. The interfaces developed in this research line go toward specific domains that can later extend to other areas. The scope of the study is broad as regards the use of technologies for the development of the interface and parallelism.
Parallel Application Modeling
Team

Prof. Dr. Luiz Gustavo Leão Fernandes
General Coordinator

Prof. Dr. Dalvan Griebler
Research Coordinator
Last Papers
2021 |
High-Level Stream and Data Parallelism in C++ for Multi-Cores Inproceedings XXV Brazilian Symposium on Programming Languages (SBLP), ACM, Joinville, Brazil, 2021. |
Assessing Coding Metrics for Parallel Programming of Stream Processing Programs on Multi-cores Inproceedings doi 2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), pp. 291-295, IEEE, Pavia, Italy, 2021, ISBN: 978-1-6654-2705-0. |
High-level and Efficient Structured Stream Parallelism for Rust on Multi-cores Journal Article Journal of Computer Languages, na (na), pp. na, 2021. |
The NAS parallel benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures Journal Article doi Future Generation Computer Systems, na (na), pp. na, 2021. |
Melhorando a Geração Automática de Código Paralelo para o Paradigma de Processamento de Stream em Multi-cores Journal Article Revista Eletrônica de Iniciação Científica em Computação, 19 (2), pp. 2083, 2021. |
Projects
Software
Last News
Contact us!
Or, feel free to use the form below to contact us.