GMAP – Grupo de Modelagem de Aplicações Paralelas

Parallel Applications Modelling Group

GMAP is a research group at the Pontifical Catholic University of Rio Grande do Sul (PUCRS). Historically, the group has conducted several types of research on modeling and adapting robust, real-world applications from different domains (physics, mathematics, geology, image processing, biology, aerospace, and many others) to run efficiently on High-Performance Computing (HPC) architectures, such as Clusters.

In the last decade, new abstractions of parallelism are being created through domain-specific languages (DSLs), libraries, and frameworks for the next generation of computer algorithms and architectures, such as embedded hardware and servers with accelerators like Graphics Processing Units (GPUs) or Field-Programmable Gate Array (FPGAs). This has been applied to stream processing and data science-oriented applications. Concomitantly, since 2018, research is being conducted using artificial intelligence to optimize applications in the areas of Medicine, Ecology, Industry, Agriculture, Education, Smart Cities, and others.

Research Lines

Applied Data Science

This line of research aims to apply and develop methods for knowledge extraction from databases, management of large volumes of data, explore artificial intelligence algorithms for problem-solving in Medicine, Ecology, Industry, Smart Cities, Agriculture, and other application domains. Data-intensive processing platforms are also enhanced for specific needs.

Parallelism Abstractions

The research line HSPA (High-level and Structured Parallelism Abstractions) aims to create programming interfaces for the user/programmer who is not able in dealing with the parallel programming paradigm. The idea is to offer a higher level of abstraction, where the performance of applications is not compromised. The interfaces developed in this research line go toward specific domains that can later extend to other areas. The scope of the study is broad as regards the use of technologies for the development of the interface and parallelism.

Parallel Application Modeling

The Parallel Applications Modelling research line is centered on the study of high-performance solutions for problems that originated in other areas of knowledge (Medicine, Biology, Geology, Physics, etc). The development of high-performance solutions for problems requires a grasp of many levels of specialized knowledge. Besides an analysis of the algorithmic complexity of the problem to be parallelized, it is also necessary to have knowledge about the developing environments and program tests, parallel programming techniques, performance evaluations, and high-performance architectures.

Team

See all team members

Prof. Dr. Luiz Gustavo Leão Fernandes

General Coordinator

Associate Professor at Polytechnic School, at the Graduate Program in Computer Science, and coordinator of the Stricto Sensu Programs of the Pro-Rectory of Research and Graduate Studies at PUCRS. Founder of GMAP. He works in the management of research projects and student orientation.

Prof. Dr. Dalvan Griebler

Research Coordinator

Associate Professor at the Polytechnic School and Computer Science Graduate Program at PUCRS. He also works with the management and development of research projects and student orientation.

Last Papers

See all papers

174 entries « ‹ 7 of 35 › »

2020

Rockenbach, Dinei A

High-Level Programming Abstractions for Stream Parallelism on GPUs Masters Thesis

School of Technology - PPGCC - PUCRS, 2020.

Abstract | Links

@mastersthesis{ROCKENBACH:DM:20,
title = {High-Level Programming Abstractions for Stream Parallelism on GPUs},
author = {Dinei A. Rockenbach},
url = {https://tede2.pucrs.br/tede2/handle/tede/9592},
year = {2020},
date = {2020-11-27},
address = {Porto Alegre, Brazil},
school = {School of Technology - PPGCC - PUCRS},
abstract = {The growth and spread of parallel architectures have driven the pursuit of greater computing power with massively parallel hardware such as the Graphics Processing Units (GPUs). This new heterogeneous computer architecture composed of multi-core Central Processing Units (CPUs) and many-core GPUs became usual, enabling novel software applications such as self-driving cars, real-time ray tracing, deep learning, and Virtual Reality (VR), which are characterized as stream processing applications. However, this heterogeneous environment poses an additional challenge to software development, which is still in the process of adapting to the parallel processing paradigm on multi-core systems, where programmers are supported by several Application Programming Interfaces (APIs) that offer different abstraction levels. The parallelism exploitation in GPU is done using both CUDA and OpenCL for academia and industry, whose developers have to deal with low-level architecture concepts to efficiently exploit GPU parallelism in their applications. There is still a lack of parallel programming abstractions when: 1) parallelizing code on GPUs, and 2) needing higher-level programming abstractions that deal with both CPU and GPU parallelism. Unfortunately, developers still have to be expert programmers on system and architecture to enable efficient hardware parallelism exploitation in this architectural environment. To contribute to the first problem, we created GSPARLIB, a novel structured parallel programming library for exploiting GPU parallelism that provides a unified programming API and driver-agnostic runtime. It offers Map and Reduce parallel patterns on top of CUDA and OpenCL drivers. We evaluate its performance comparing with state-of-the-art APIs, where the experiments revealed a comparable and efficient performance. For contributing to the second problem, we extended the SPar Domain-Specific Language (DSL), which has been proved to be high-level and productive for expressing stream parallelism with C++ annotations in multi-core CPUs. In this work, we propose and implement new annotations that increase expressiveness to combine the current stream parallelism on CPUs and data parallelism on GPUs. We also provide new pattern-based transformation rules that were implemented in the compiler targeting automatic source-to-source code transformations using GSPARLIB for GPU parallelism exploitation. Our experiments demonstrate that SPar compiler is able to generate stream and data parallel patterns without significant performance penalty compared to handwritten code. Thanks to these advances in SPar, our work is the first on providing high-level C++11 annotations as an API that does not require significant code refactoring in sequential programs while enabling multi-core CPU and many-core GPU parallelism exploitation for stream processing applications.},
keywords = {},
pubstate = {published},
tppubtype = {mastersthesis}
}

The growth and spread of parallel architectures have driven the pursuit of greater computing power with massively parallel hardware such as the Graphics Processing Units (GPUs). This new heterogeneous computer architecture composed of multi-core Central Processing Units (CPUs) and many-core GPUs became usual, enabling novel software applications such as self-driving cars, real-time ray tracing, deep learning, and Virtual Reality (VR), which are characterized as stream processing applications. However, this heterogeneous environment poses an additional challenge to software development, which is still in the process of adapting to the parallel processing paradigm on multi-core systems, where programmers are supported by several Application Programming Interfaces (APIs) that offer different abstraction levels. The parallelism exploitation in GPU is done using both CUDA and OpenCL for academia and industry, whose developers have to deal with low-level architecture concepts to efficiently exploit GPU parallelism in their applications. There is still a lack of parallel programming abstractions when: 1) parallelizing code on GPUs, and 2) needing higher-level programming abstractions that deal with both CPU and GPU parallelism. Unfortunately, developers still have to be expert programmers on system and architecture to enable efficient hardware parallelism exploitation in this architectural environment. To contribute to the first problem, we created GSPARLIB, a novel structured parallel programming library for exploiting GPU parallelism that provides a unified programming API and driver-agnostic runtime. It offers Map and Reduce parallel patterns on top of CUDA and OpenCL drivers. We evaluate its performance comparing with state-of-the-art APIs, where the experiments revealed a comparable and efficient performance. For contributing to the second problem, we extended the SPar Domain-Specific Language (DSL), which has been proved to be high-level and productive for expressing stream parallelism with C++ annotations in multi-core CPUs. In this work, we propose and implement new annotations that increase expressiveness to combine the current stream parallelism on CPUs and data parallelism on GPUs. We also provide new pattern-based transformation rules that were implemented in the compiler targeting automatic source-to-source code transformations using GSPARLIB for GPU parallelism exploitation. Our experiments demonstrate that SPar compiler is able to generate stream and data parallel patterns without significant performance penalty compared to handwritten code. Thanks to these advances in SPar, our work is the first on providing high-level C++11 annotations as an API that does not require significant code refactoring in sequential programs while enabling multi-core CPU and many-core GPU parallelism exploitation for stream processing applications.

Hoffmann, Renato B; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo

Stream Parallelism Annotations for Multi-Core Frameworks Inproceedings doi

XXIV Brazilian Symposium on Programming Languages (SBLP), pp. 48-55, ACM, Natal, Brazil, 2020.

Abstract | Links

@inproceedings{HOFFMANN:SBLP:20,
title = {Stream Parallelism Annotations for Multi-Core Frameworks},
author = {Renato B. Hoffmann and Dalvan Griebler and Marco Danelutto and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1145/3427081.3427088},
doi = {10.1145/3427081.3427088},
year = {2020},
date = {2020-10-01},
booktitle = {XXIV Brazilian Symposium on Programming Languages (SBLP)},
pages = {48-55},
publisher = {ACM},
address = {Natal, Brazil},
series = {SBLP'20},
abstract = {Data generation, collection, and processing is an important workload of modern computer architectures. Stream or high-intensity data flow applications are commonly employed in extracting and interpreting the information contained in this data. Due to the computational complexity of these applications, high-performance ought to be achieved using parallel computing. Indeed, the efficient exploitation of available parallel resources from the architecture remains a challenging task for the programmers. Techniques and methodologies are required to help shift the efforts from the complexity of parallelism exploitation to specific algorithmic solutions. To tackle this problem, we propose a methodology that provides the developer with a suitable abstraction layer between a clean and effective parallel programming interface targeting different multi-core parallel programming frameworks. We used standard C++ code annotations that may be inserted in the source code by the programmer. Then, a compiler parses C++ code with the annotations and generates calls to the desired parallel runtime API. Our experiments demonstrate the feasibility of our methodology and the performance of the abstraction layer, where the difference is negligible in four applications with respect to the state-of-the-art C++ parallel programming frameworks. Additionally, our methodology allows improving the application performance since the developers can choose the runtime that best performs in their system.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}

Garcia, Adriano Marques; Serpa, Matheus; Griebler, Dalvan; Schepke, Claudio; Fernandes, Luiz Gustavo; Navaux, Philippe O A

The Impact of CPU Frequency Scaling on Power Consumption of Computing Infrastructures Inproceedings doi

International Conference on Computational Science and its Applications (ICCSA), pp. 142-157, Springer, Cagliari, Italy, 2020.

Abstract | Links

@inproceedings{GARCIA:ICCSA:20,
title = {The Impact of CPU Frequency Scaling on Power Consumption of Computing Infrastructures},
author = {Adriano Marques Garcia and Matheus Serpa and Dalvan Griebler and Claudio Schepke and Luiz Gustavo Fernandes and Philippe O A Navaux},
url = {https://doi.org/10.1007/978-3-030-58817-5_12},
doi = {10.1007/978-3-030-58817-5_12},
year = {2020},
date = {2020-07-01},
booktitle = {International Conference on Computational Science and its Applications (ICCSA)},
volume = {12254},
pages = {142-157},
publisher = {Springer},
address = {Cagliari, Italy},
series = {ICCSA'20},
abstract = {Since the demand for computing power increases, new architectures emerged to obtain better performance. Reducing the power and energy consumption of these architectures is one of the main challenges to achieving high-performance computing. Current research trends aim at developing new software and hardware techniques to achieve the best performance and energy trade-offs. In this work, we investigate the impact of different CPU frequency scaling techniques such as ondemand, performance, and powersave on the power and energy consumption of multi-core based computer infrastructure. We apply these techniques in PAMPAR, a parallel benchmark suite implemented in PThreads, OpenMP, MPI-1, and MPI-2 (spawn). We measure the energy and execution time of 10 benchmarks, varying the number of threads. Our results show that although powersave consumes up to 43.1% less power than performance and ondemand governors, it consumes the triple of energy due to the high execution time. Our experiments also show that the performance governor consumes up to 9.8% more energy than ondemand for CPU-bound benchmarks. Finally, our results show that PThreads has the lowest power consumption, consuming less than the sequential version for memory-bound benchmarks. Regarding performance, the performance governor achieved 3% of performance over the ondemand.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}

Stein, Charles Michael; Rockenbach, Dinei A; Griebler, Dalvan; Torquati, Massimo; Mencagli, Gabriele; Danelutto, Marco; Fernandes, Luiz Gustavo

Latency‐aware adaptive micro‐batching techniques for streamed data compression on graphics processing units Journal Article doi

Concurrency and Computation: Practice and Experience, na (na), pp. e5786, 2020.

Abstract | Links

@article{STEIN:CCPE:20,
title = {Latency‐aware adaptive micro‐batching techniques for streamed data compression on graphics processing units},
author = {Charles Michael Stein and Dinei A. Rockenbach and Dalvan Griebler and Massimo Torquati and Gabriele Mencagli and Marco Danelutto and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1002/cpe.5786},
doi = {10.1002/cpe.5786},
year = {2020},
date = {2020-05-01},
journal = {Concurrency and Computation: Practice and Experience},
volume = {na},
number = {na},
pages = {e5786},
publisher = {Wiley Online Library},
abstract = {Stream processing is a parallel paradigm used in many application domains. With the advance of graphics processing units (GPUs), their usage in stream processing applications has increased as well. The efficient utilization of GPU accelerators in streaming scenarios requires to batch input elements in microbatches, whose computation is offloaded on the GPU leveraging data parallelism within the same batch of data. Since data elements are continuously received based on the input speed, the bigger the microbatch size the higher the latency to completely buffer it and to start the processing on the device. Unfortunately, stream processing applications often have strict latency requirements that need to find the best size of the microbatches and to adapt it dynamically based on the workload conditions as well as according to the characteristics of the underlying device and network. In this work, we aim at implementing latency‐aware adaptive microbatching techniques and algorithms for streaming compression applications targeting GPUs. The evaluation is conducted using the Lempel‐Ziv‐Storer‐Szymanski compression application considering different input workloads. As a general result of our work, we noticed that algorithms with elastic adaptation factors respond better for stable workloads, while algorithms with narrower targets respond better for highly unbalanced workloads.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Löff, Junior; Griebler, Dalvan; Fernandes, Luiz Gustavo

Implementação Paralela do LU no NPB C++ Utilizando um Pipeline Implícito Inproceedings doi

XX Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 37-40, Sociedade Brasileira de Computação (SBC), Santa Maria, BR, 2020.

Abstract | Links

174 entries « ‹ 7 of 35 › »

Projects

GMAP is involved in various research projects in many areas of computing. To see a list of all completed or ongoing projects, please visit the Projects page.

See all projects

Software

To see a list of the software developed by the researchers from GMAP, please visit the Software page.

See all software

Last News

More news

GMAP researchers win parallel programming marathon at ERAD/RS 2024

From April 24th to 26th, 2024, the XXIV Escola Regional de Alto Desempenho da Região Sul (ERAD/RS 2024) occurred in Florianópolis, SC. This event is held annually by the Sociedade Brasileira de Computação (SBC) in conjunction with the Comissão Especial de Arquitetura de Computadores e Processamento de Alto Desempenho (CE-ACPAD) and Comissão Regional de Alto…

GMAP researcher is awarded in PDP 2023

Between March 1st and 3rd, the 31st edition of the Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP 2023) took place in Naples, Italy. This year, the 31st PDP edition is part of the promotional activities of the ADMIRE project, which will also be an exhibitor in the Conference Demo Area along with…

Paper Presentations at PDP 2023

Euromicro is an international scientific organization that advances the sciences and applications of Information Technology and Microelectronics. A significant focus is organizing conferences and workshops in Computer Science and Computer Engineering. Annually, the Euromicro organization hosts the Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP). This year, the 31st edition of the event…

Contact us!

If you have any questions or proposals, whether you are a student or a company seeking partnership for research projects, we are available!

Address

Av. Ipiranga, 6681
Prédio 32, Sala 601 – 6º andar
Porto Alegre – RS / Brazil
Zip code: 90619-900

Phone number

+55 51 33203611
Phone extension: 8601

Email

gmap.poa@gmail.com

Or, feel free to use the form below to contact us.