Publications

174 entries « ‹ 1 of 4 › »

2023

Garcia, Adriano Marques; Griebler, Dalvan; Schepke, Claudio; Fernandes, Luiz Gustavo

Micro-batch and data frequency for stream processing on multi-cores Journal Article doi

The Journal of Supercomputing, In press (In press), pp. 1-39, 2023.

Abstract | Links | BibTeX

2022

Garcia, Adriano Marques; Griebler, Dalvan; Schepke, Claudio; Fernandes, Luiz Gustavo

SPBench: a framework for creating benchmarks of stream processing applications Journal Article doi

Computing, In press (In press), pp. 1-23, 2022.

Abstract | Links | BibTeX

Hoffmann, Renato Barreto; Löff, Júnior; Griebler, Dalvan; Fernandes, Luiz Gustavo

OpenMP as runtime for providing high-level stream parallelism on multi-cores Journal Article doi

The Journal of Supercomputing, 1 (1), pp. 7655–7676, 2022.

Abstract | Links | BibTeX

Vogel, Adriano; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo

Self-adaptation on Parallel Stream Processing: A Systematic Review Journal Article doi

Concurrency and Computation: Practice and Experience, 34 (6), pp. e6759, 2022.

Abstract | Links | BibTeX

Garcia, Adriano Marques; Griebler, Dalvan; Schepke, Claudio; Fernandes, Luiz Gustavo

Evaluating Micro-batch and Data Frequency for Stream Processing Applications on Multi-cores Inproceedings doi

30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 10-17, IEEE, Valladolid, Spain, 2022.

Abstract | Links | BibTeX

Garcia, Adriano Marques; Griebler, Dalvan; Schepke, Claudio; Fernandes, Luiz Gustavo

Um Framework para Criar Benchmarks de Aplicações Paralelas de Stream Inproceedings doi

Anais da XXII Escola Regional de Alto Desempenho da Região Sul, pp. 97–98, Sociedade Brasileira de Computação, Curitiba, Brazil, 2022.

Abstract | Links | BibTeX

Andrade, Gabriella; Griebler, Dalvan; Fernandes, Luiz Gustavo

Avaliação do Esforço de Programação em GPU: Estudo Piloto Inproceedings doi

Anais da XXII Escola Regional de Alto Desempenho da Região Sul, pp. 95–96, Sociedade Brasileira de Computação, Curitiba, Brazil, 2022.

Abstract | Links | BibTeX

Müller, Caetano; Löff, Junior; Griebler, Dalvan; Eizirik, Eduardo

Avaliação da aplicação de paralelismo em classificadores taxonômicos usando Qiime2 Inproceedings doi

Anais da XXII Escola Regional de Alto Desempenho da Região Sul, pp. 25–28, Sociedade Brasileira de Computação (SBC), Porto Alegre, RS, Brasil, 2022.

Abstract | Links | BibTeX

Rockenbach, Dinei A; Löff, Júnior; Araujo, Gabriell; Griebler, Dalvan; Fernandes, Luiz G

High-Level Stream and Data Parallelism in C++ for GPUs Inproceedings doi

XXVI Brazilian Symposium on Programming Languages (SBLP), pp. 41-49, ACM, Uberlândia, Brazil, 2022.

Abstract | Links | BibTeX

Vogel, Adriano

Self-adaptive abstractions for efficient high-level parallel computing in multi-cores PhD Thesis

School of Technology - PUCRS, 2022.

Abstract | Links | BibTeX

@phdthesis{VOGEL:PHD_PUCRS:22,
title = {Self-adaptive abstractions for efficient high-level parallel computing in multi-cores},
author = {Adriano Vogel},
url = {https://tede2.pucrs.br/tede2/handle/tede/10232},
year = {2022},
date = {2022-05-01},
address = {Porto Alegre, Brazil},
school = {School of Technology - PUCRS},
abstract = {Nowadays, a significant part of computing systems and real-world applications demand parallelism to accelerate their executions. Although high-level and structured parallel programming aims to facilitate parallelism exploitation, there are still issues to be addressed to improve existing parallel programming abstractions. Usually, application developers still have to set non-intuitive or complex parallelism configurations. In this context, self-adaptation is a potential alternative to provide a higher-level of autonomic abstractions and runtime responsiveness in parallel executions. However, a recurrent problem is that self-adaptation is still limited in terms of flexibility, efficiency, and abstractions. For instance, there is a lack of mechanisms to apply adaptation actions and efficient decision-making strategies to decide which configurations to be enforced at run-time. In this work, we are interested in abstractions achievable with self-adaptation transparently managing the executions while the parallel programs are running (at run-time). Our main goals are to increase the adaptation space to be more representative of real-world applications and make self-adaptation more efficient with comprehensive evaluation methodologies, which can provide use-cases demonstrating the true potentials of self-adaptation. Therefore, this doctoral dissertation provides the following scientific contributions: I) An Systematic Literature Review (SLR) providing a taxonomy of the state-of-the-art. II) A conceptual framework to support designing and abstracting the decision-making process within self-adaptive solutions, such a conceptual framework is then employed in the technical contributions to assist in making the solutions more modular and potentially generalizable. III) Mechanisms and strategies for self-adaptive replicas in applications with single and multiple parallel stages, supporting multiple customizable non-functional requirements. IV) Mechanism, strategy, and optimizations for self-adaptation of Parallel Patterns/applications’ graphs topologies. We apply the proposed solutions to the context of stream processing applications, a representative paradigm present in several real-world applications that compute data flowing in the form of streams (e.g., video feeds, image, and data analytics). A part of the proposed solutions is evaluated with SPar and another part with the FastFlow programming framework. The results demonstrate that self-adaptation can provide efficient parallelism abstractions and autonomous responsiveness at run-time, yet achieve a competitive performance w.r.t. the best static executions. Moreover, when appropriate, we compare state-of-the-art solutions and demonstrate that our highly optimized decision-making strategies achieve significant performance and efficiency gains.},
keywords = {},
pubstate = {published},
tppubtype = {phdthesis}
}

Nowadays, a significant part of computing systems and real-world applications demand parallelism to accelerate their executions. Although high-level and structured parallel programming aims to facilitate parallelism exploitation, there are still issues to be addressed to improve existing parallel programming abstractions. Usually, application developers still have to set non-intuitive or complex parallelism configurations. In this context, self-adaptation is a potential alternative to provide a higher-level of autonomic abstractions and runtime responsiveness in parallel executions. However, a recurrent problem is that self-adaptation is still limited in terms of flexibility, efficiency, and abstractions. For instance, there is a lack of mechanisms to apply adaptation actions and efficient decision-making strategies to decide which configurations to be enforced at run-time. In this work, we are interested in abstractions achievable with self-adaptation transparently managing the executions while the parallel programs are running (at run-time). Our main goals are to increase the adaptation space to be more representative of real-world applications and make self-adaptation more efficient with comprehensive evaluation methodologies, which can provide use-cases demonstrating the true potentials of self-adaptation. Therefore, this doctoral dissertation provides the following scientific contributions: I) An Systematic Literature Review (SLR) providing a taxonomy of the state-of-the-art. II) A conceptual framework to support designing and abstracting the decision-making process within self-adaptive solutions, such a conceptual framework is then employed in the technical contributions to assist in making the solutions more modular and potentially generalizable. III) Mechanisms and strategies for self-adaptive replicas in applications with single and multiple parallel stages, supporting multiple customizable non-functional requirements. IV) Mechanism, strategy, and optimizations for self-adaptation of Parallel Patterns/applications’ graphs topologies. We apply the proposed solutions to the context of stream processing applications, a representative paradigm present in several real-world applications that compute data flowing in the form of streams (e.g., video feeds, image, and data analytics). A part of the proposed solutions is evaluated with SPar and another part with the FastFlow programming framework. The results demonstrate that self-adaptation can provide efficient parallelism abstractions and autonomous responsiveness at run-time, yet achieve a competitive performance w.r.t. the best static executions. Moreover, when appropriate, we compare state-of-the-art solutions and demonstrate that our highly optimized decision-making strategies achieve significant performance and efficiency gains.

Vogel, Adriano

Self-adaptive abstractions for efficient high-level parallel computing in multi-cores PhD Thesis

Computer Science Department - University of Pisa, 2022.

Abstract | Links | BibTeX

@phdthesis{VOGEL:PHD_PISA:22,
title = {Self-adaptive abstractions for efficient high-level parallel computing in multi-cores},
author = {Adriano Vogel},
url = {https://etd.adm.unipi.it/theses/available/etd-04142022-142258/unrestricted/Vogel_PhD_Dissertation_UNIPI.pdf},
year = {2022},
date = {2022-05-01},
address = {Pisa, Italy},
school = {Computer Science Department - University of Pisa},
abstract = {Nowadays, a significant part of computing systems and real-world applications demand parallelism to accelerate their executions. Although high-level and structured parallel programming aims to facilitate parallelism exploitation, there are still issues to be addressed to improve existing parallel programming abstractions. Usually, application developers still have to set non-intuitive or complex parallelism configurations. In this context, self-adaptation is a potential alternative to provide a higher-level of autonomic abstractions and runtime responsiveness in parallel executions. However, a recurrent problem is that self-adaptation is still limited in terms of flexibility, efficiency, and abstractions. For instance, there is a lack of mechanisms to apply adaptation actions and efficient decision-making strategies to decide which configurations to be enforced at run-time. In this work, we are interested in abstractions achievable with self-adaptation transparently managing the executions while the parallel programs are running (at run-time). Our main goals are to increase the adaptation space to be more representative of real-world applications and make self-adaptation more efficient with comprehensive evaluation methodologies, which can provide use-cases demonstrating the true potentials of self-adaptation. Therefore, this doctoral dissertation provides the following scientific contributions: I) An Systematic Literature Review (SLR) providing a taxonomy of the state-of-the-art. II) A conceptual framework to support designing and abstracting the decision-making process within self-adaptive solutions, such a conceptual framework is then employed in the technical contributions to assist in making the solutions more modular and potentially generalizable. III) Mechanisms and strategies for self-adaptive replicas in applications with single and multiple parallel stages, supporting multiple customizable non-functional requirements. IV) Mechanism, strategy, and optimizations for self-adaptation of Parallel Patterns/applications’ graphs topologies. We apply the proposed solutions to the context of stream processing applications, a representative paradigm present in several real-world applications that compute data flowing in the form of streams (e.g., video feeds, image, and data analytics). A part of the proposed solutions is evaluated with SPar and another part with the FastFlow programming framework. The results demonstrate that self-adaptation can provide efficient parallelism abstractions and autonomous responsiveness at run-time, yet achieve a competitive performance w.r.t. the best static executions. Moreover, when appropriate, we compare state-of-the-art solutions and demonstrate that our highly optimized decision-making strategies achieve significant performance and efficiency gains.},
keywords = {},
pubstate = {published},
tppubtype = {phdthesis}
}

2021

Vogel, Adriano; Griebler, Dalvan; Fernandes, Luiz G

Providing High‐Level Self‐Adaptive Abstractions for Stream Parallelism on Multicores Journal Article doi

Software: Practice and Experience, 51 (6), pp. 1194-1217, 2021.

Abstract | Links | BibTeX

Löff, Júnior; Griebler, Dalvan; Fernandes, Luiz Gustavo

Melhorando a Geração Automática de Código Paralelo para o Paradigma de Processamento de Stream em Multi-cores Journal Article

Revista Eletrônica de Iniciação Científica em Computação, 19 (2), pp. 2083, 2021.

Abstract | Links | BibTeX

Hoffmann, Renato Barreto; Griebler, Dalvan; Fernandes, Luiz Gustavo

Geração de Código OpenMP para o Paralelismo de Stream Journal Article

Revista Eletrônica de Iniciação Científica em Computação, 19 (2), pp. 2082, 2021.

Abstract | Links | BibTeX

Pieper, Ricardo; Löff, Júnior; Hoffmann, Renato Berreto; Griebler, Dalvan; Fernandes, Luiz Gustavo

High-level and Efficient Structured Stream Parallelism for Rust on Multi-cores Journal Article doi

Journal of Computer Languages, 65 (na), pp. 101054, 2021, ISSN: 2590-1184.

Abstract | Links | BibTeX

Löff, Júnior; Griebler, Dalvan; Mencagli, Gabriele; Araujo, Gabriell; Torquati, Massimo; Danelutto, Marco; Fernandes, Luiz Gustavo

The NAS parallel benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures Journal Article doi

Future Generation Computer Systems, na (na), pp. na, 2021.

Abstract | Links | BibTeX

@article{LOFF:FGCS:21,
title = {The NAS parallel benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures},
author = {Júnior Löff and Dalvan Griebler and Gabriele Mencagli and Gabriell Araujo and Massimo Torquati and Marco Danelutto and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1016/j.future.2021.07.021},
doi = {10.1016/j.future.2021.07.021},
year = {2021},
date = {2021-07-01},
journal = {Future Generation Computer Systems},
volume = {na},
number = {na},
pages = {na},
publisher = {Elsevier},
abstract = {The NAS Parallel Benchmarks (NPB), originally implemented mostly in Fortran, is a consolidated suite containing several benchmarks extracted from Computational Fluid Dynamics (CFD) models. The benchmark suite has important characteristics such as intensive memory communications, complex data dependencies, different memory access patterns, and hardware components/sub-systems overload. Parallel programming APIs, libraries, and frameworks that are written in C++ as well as new optimizations and parallel processing techniques can benefit if NPB is made fully available in this programming language. In this paper we present NPB-CPP, a fully C++ translated version of NPB consisting of all the NPB kernels and pseudo-applications developed using OpenMP, Intel TBB, and FastFlow parallel frameworks for multicores. The design of NPB-CPP leverages the Structured Parallel Programming methodology (essentially based on parallel design patterns). We show the structure of each benchmark application in terms of composition of few patterns (notably Map and MapReduce constructs) provided by the selected C++ frameworks. The experimental evaluation shows the accuracy of NPB-CPP with respect to the original NPB source code. Furthermore, we carefully evaluate the parallel performance on three multi-core systems (Intel, IBM Power and AMD) with different C++ compilers (gcc, icc and clang) by discussing the performance differences in order to give to the researchers useful insights to choose the best parallel programming framework for a given type of problem.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Garcia, Adriano Marques; Griebler, Dalvan; Schepke, Claudio; Fernandes, Luiz Gustavo

Introducing a Stream Processing Framework for Assessing Parallel Programming Interfaces Inproceedings doi

29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 84-88, IEEE, Valladolid, Spain, 2021.

Abstract | Links | BibTeX

Andrade, Gabriella; Griebler, Dalvan; Santos, Rodrigo; Danelutto, Marco; Fernandes, Luiz Gustavo

Assessing Coding Metrics for Parallel Programming of Stream Processing Programs on Multi-cores Inproceedings doi

2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), pp. 291-295, IEEE, Pavia, Italy, 2021, ISBN: 978-1-6654-2705-0.

Abstract | Links | BibTeX

Scheer, Claudio; Griebler, Dalvan; Fernandes, Luiz Gustavo

Proposta de Otimização do Tamanho de Batch em Aplicações de Stream para Multicores usando Aprendizado de Máquina Inproceedings doi

21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 127-128, Sociedade Brasileira de Computação, Joinville, Brazil, 2021.

Abstract | Links | BibTeX

Garcia, Adriano Marques; Griebler, Dalvan; Schepke, Claudio; Fernandes, Luiz Gustavo

Proposta de um Framework para Avaliar Interfaces de Programação Paralela em Aplicações de Stream Inproceedings doi

21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 119-120, Sociedade Brasileira de Computação, Joinville, Brazil, 2021.

Abstract | Links | BibTeX

Rockenbach, Dinei A; Griebler, Dalvan; Fernandes, Luiz Gustavo

Provendo Abstrações de Alto Nível para GPUs na SPar Inproceedings doi

21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 109-110, Sociedade Brasileira de Computação, Joinville, Brazil, 2021.

Abstract | Links | BibTeX

Araujo, Gabriell; Griebler, Dalvan; Fernandes, Luiz Gustavo

Proposta de Suporte à Parametrização no NPB com CUDA Inproceedings doi

21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 103-104, Sociedade Brasileira de Computação, Joinville, Brazil, 2021.

Abstract | Links | BibTeX

Vogel, Adriano; Griebler, Dalvan; Fernandes, Luiz Gustavo

Proposta de Adaptação Dinâmica de Padrões Paralelos Inproceedings doi

21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 101-102, Sociedade Brasileira de Computação, Joinville, Brazil, 2021.

Abstract | Links | BibTeX

Andrade, Gabriella; Griebler, Dalvan; Santos, Rodrigo; Fernandes, Luiz Gustavo

Uso de Métricas de Codificação para Avaliar a Programação Paralela nas Aplicações de Stream em Sistemas Multi-core Inproceedings doi

21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 93-94, Sociedade Brasileira de Computação, Joinville, Brazil, 2021.

Abstract | Links | BibTeX

Mello, Fernanda; Griebler, Dalvan; Manssour, Isabel; Fernandes, Luiz Gustavo

Compressão de Dados em Multicores com Flink ou SPar? Inproceedings doi

21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 77-80, Sociedade Brasileira de Computação, Joinville, Brazil, 2021.

Abstract | Links | BibTeX

Hoffmann, Renato Barreto; Griebler, Dalvan; Fernandes, Luiz Gustavo

Abstraindo o OpenMP no Desenvolvimento de Aplicações de Fluxo de Dados Contínuo Inproceedings doi

21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 69-72, Sociedade Brasileira de Computação, Joinville, Brazil, 2021.

Abstract | Links | BibTeX

Löff, Júnior; Griebler, Dalvan; Fernandes, Luiz Gustavo

Melhorando a Geração Automática de Código Paralelo em Arquiteturas Multi-core na SPar Inproceedings doi

21th Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 65-68, Sociedade Brasileira de Computação, Joinville, Brazil, 2021.

Abstract | Links | BibTeX

Löff, Júnior; Hoffmann, Renato Barreto; Griebler, Dalvan; Fernandes, Luiz G

High-Level Stream and Data Parallelism in C++ for Multi-Cores Inproceedings doi

XXV Brazilian Symposium on Programming Languages (SBLP), pp. 41-48, ACM, Joinville, Brazil, 2021.

Abstract | Links | BibTeX

Vogel, Adriano; Mencagli, Gabriele; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo

Towards On-the-fly Self-Adaptation of Stream Parallel Patterns Inproceedings doi

29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 889-93, IEEE, Valladolid, Spain, 2021.

Abstract | Links | BibTeX

2020

Bordin, Maycon Viana; Griebler, Dalvan; Mencagli, Gabriele; Geyer, Claudio F R; Fernandes, Luiz Gustavo

DSPBench: a Suite of Benchmark Applications for Distributed Data Stream Processing Systems Journal Article doi

IEEE Access, 8 (na), pp. 222900-222917, 2020.

Abstract | Links | BibTeX

Stein, Charles Michael; Rockenbach, Dinei A; Griebler, Dalvan; Torquati, Massimo; Mencagli, Gabriele; Danelutto, Marco; Fernandes, Luiz Gustavo

Latency‐aware adaptive micro‐batching techniques for streamed data compression on graphics processing units Journal Article doi

Concurrency and Computation: Practice and Experience, na (na), pp. e5786, 2020.

Abstract | Links | BibTeX

@article{STEIN:CCPE:20,
title = {Latency‐aware adaptive micro‐batching techniques for streamed data compression on graphics processing units},
author = {Charles Michael Stein and Dinei A. Rockenbach and Dalvan Griebler and Massimo Torquati and Gabriele Mencagli and Marco Danelutto and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1002/cpe.5786},
doi = {10.1002/cpe.5786},
year = {2020},
date = {2020-05-01},
journal = {Concurrency and Computation: Practice and Experience},
volume = {na},
number = {na},
pages = {e5786},
publisher = {Wiley Online Library},
abstract = {Stream processing is a parallel paradigm used in many application domains. With the advance of graphics processing units (GPUs), their usage in stream processing applications has increased as well. The efficient utilization of GPU accelerators in streaming scenarios requires to batch input elements in microbatches, whose computation is offloaded on the GPU leveraging data parallelism within the same batch of data. Since data elements are continuously received based on the input speed, the bigger the microbatch size the higher the latency to completely buffer it and to start the processing on the device. Unfortunately, stream processing applications often have strict latency requirements that need to find the best size of the microbatches and to adapt it dynamically based on the workload conditions as well as according to the characteristics of the underlying device and network. In this work, we aim at implementing latency‐aware adaptive microbatching techniques and algorithms for streaming compression applications targeting GPUs. The evaluation is conducted using the Lempel‐Ziv‐Storer‐Szymanski compression application considering different input workloads. As a general result of our work, we noticed that algorithms with elastic adaptation factors respond better for stable workloads, while algorithms with narrower targets respond better for highly unbalanced workloads.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Andrade, Gabriella; Griebler, Dalvan; Fernandes, Luiz Gustavo

Avaliação da Usabilidade de Interfaces de Programação Paralela para Sistemas Multi-Core em Aplicação de Vídeo Inproceedings doi

XX Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 149-150, Sociedade Brasileira de Computação (SBC), Santa Maria, BR, 2020.

Abstract | Links | BibTeX

Justo, Gabriel; Hoffmann, Renato Barreto; Vogel, Adriano; Griebler, Dalvan; Fernandes, Luiz Gustavo

Acelerando uma Aplicação de Detecção de Pistas com MPI Inproceedings doi

XX Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 117-120, Sociedade Brasileira de Computação (SBC), Santa Maria, BR, 2020.

Abstract | Links | BibTeX

Hoffmann, Renato Barreto; Griebler, Dalvan; Fernandes, Luiz Gustavo

Geração Automática de Código TBB na SPar Inproceedings doi

XX Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 97-100, Sociedade Brasileira de Computação (SBC), Santa Maria, BR, 2020.

Abstract | Links | BibTeX

de Araújo, Gabriell Alves; Griebler, Dalvan; Fernandes, Luiz Gustavo

Implementação CUDA dos Kernels NPB Inproceedings doi

XX Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 85-88, Sociedade Brasileira de Computação (SBC), Santa Maria, BR, 2020.

Abstract | Links | BibTeX

Löff, Junior; Griebler, Dalvan; Fernandes, Luiz Gustavo

Implementação Paralela do LU no NPB C++ Utilizando um Pipeline Implícito Inproceedings doi

XX Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 37-40, Sociedade Brasileira de Computação (SBC), Santa Maria, BR, 2020.

Abstract | Links | BibTeX

Hoffmann, Renato B; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo

Stream Parallelism Annotations for Multi-Core Frameworks Inproceedings doi

XXIV Brazilian Symposium on Programming Languages (SBLP), pp. 48-55, ACM, Natal, Brazil, 2020.

Abstract | Links | BibTeX

@inproceedings{HOFFMANN:SBLP:20,
title = {Stream Parallelism Annotations for Multi-Core Frameworks},
author = {Renato B. Hoffmann and Dalvan Griebler and Marco Danelutto and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1145/3427081.3427088},
doi = {10.1145/3427081.3427088},
year = {2020},
date = {2020-10-01},
booktitle = {XXIV Brazilian Symposium on Programming Languages (SBLP)},
pages = {48-55},
publisher = {ACM},
address = {Natal, Brazil},
series = {SBLP'20},
abstract = {Data generation, collection, and processing is an important workload of modern computer architectures. Stream or high-intensity data flow applications are commonly employed in extracting and interpreting the information contained in this data. Due to the computational complexity of these applications, high-performance ought to be achieved using parallel computing. Indeed, the efficient exploitation of available parallel resources from the architecture remains a challenging task for the programmers. Techniques and methodologies are required to help shift the efforts from the complexity of parallelism exploitation to specific algorithmic solutions. To tackle this problem, we propose a methodology that provides the developer with a suitable abstraction layer between a clean and effective parallel programming interface targeting different multi-core parallel programming frameworks. We used standard C++ code annotations that may be inserted in the source code by the programmer. Then, a compiler parses C++ code with the annotations and generates calls to the desired parallel runtime API. Our experiments demonstrate the feasibility of our methodology and the performance of the abstraction layer, where the difference is negligible in four applications with respect to the state-of-the-art C++ parallel programming frameworks. Additionally, our methodology allows improving the application performance since the developers can choose the runtime that best performs in their system.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}

Garcia, Adriano Marques; Serpa, Matheus; Griebler, Dalvan; Schepke, Claudio; Fernandes, Luiz Gustavo; Navaux, Philippe O A

The Impact of CPU Frequency Scaling on Power Consumption of Computing Infrastructures Inproceedings doi

International Conference on Computational Science and its Applications (ICCSA), pp. 142-157, Springer, Cagliari, Italy, 2020.

Abstract | Links | BibTeX

@inproceedings{GARCIA:ICCSA:20,
title = {The Impact of CPU Frequency Scaling on Power Consumption of Computing Infrastructures},
author = {Adriano Marques Garcia and Matheus Serpa and Dalvan Griebler and Claudio Schepke and Luiz Gustavo Fernandes and Philippe O A Navaux},
url = {https://doi.org/10.1007/978-3-030-58817-5_12},
doi = {10.1007/978-3-030-58817-5_12},
year = {2020},
date = {2020-07-01},
booktitle = {International Conference on Computational Science and its Applications (ICCSA)},
volume = {12254},
pages = {142-157},
publisher = {Springer},
address = {Cagliari, Italy},
series = {ICCSA'20},
abstract = {Since the demand for computing power increases, new architectures emerged to obtain better performance. Reducing the power and energy consumption of these architectures is one of the main challenges to achieving high-performance computing. Current research trends aim at developing new software and hardware techniques to achieve the best performance and energy trade-offs. In this work, we investigate the impact of different CPU frequency scaling techniques such as ondemand, performance, and powersave on the power and energy consumption of multi-core based computer infrastructure. We apply these techniques in PAMPAR, a parallel benchmark suite implemented in PThreads, OpenMP, MPI-1, and MPI-2 (spawn). We measure the energy and execution time of 10 benchmarks, varying the number of threads. Our results show that although powersave consumes up to 43.1% less power than performance and ondemand governors, it consumes the triple of energy due to the high execution time. Our experiments also show that the performance governor consumes up to 9.8% more energy than ondemand for CPU-bound benchmarks. Finally, our results show that PThreads has the lowest power consumption, consuming less than the sequential version for memory-bound benchmarks. Regarding performance, the performance governor achieved 3% of performance over the ondemand.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}

Garcia, Adriano Marques; Griebler, Dalvan; Fernandes, Luiz Gustavo

Proposta de uma Suíte de Benchmarks para Processamento de Stream em Sistemas Multi-Core Inproceedings doi

XX Escola Regional de Alto Desempenho da Região Sul (ERAD-RS), pp. 167-168, Sociedade Brasileira de Computação (SBC), Santa Maria, BR, 2020.

Abstract | Links | BibTeX

Vogel, Adriano; Rista, Cassiano; Justo, Gabriel; Ewald, Endrius; Griebler, Dalvan; Mencagli, Gabriele; Fernandes, Luiz Gustavo

Parallel Stream Processing with MPI for Video Analytics and Data Visualization Inproceedings doi

High Performance Computing Systems, pp. 102-116, Springer, Cham, 2020.

Abstract | Links | BibTeX

de Araujo, Gabriell Alves; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo

Efficient NAS Parallel Benchmark Kernels with CUDA Inproceedings doi

28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 9-16, IEEE, Västerås, Sweden, Sweden, 2020.

Abstract | Links | BibTeX

Rockenbach, Dinei A

High-Level Programming Abstractions for Stream Parallelism on GPUs Masters Thesis

School of Technology - PPGCC - PUCRS, 2020.

Abstract | Links | BibTeX

@mastersthesis{ROCKENBACH:DM:20,
title = {High-Level Programming Abstractions for Stream Parallelism on GPUs},
author = {Dinei A. Rockenbach},
url = {https://tede2.pucrs.br/tede2/handle/tede/9592},
year = {2020},
date = {2020-11-27},
address = {Porto Alegre, Brazil},
school = {School of Technology - PPGCC - PUCRS},
abstract = {The growth and spread of parallel architectures have driven the pursuit of greater computing power with massively parallel hardware such as the Graphics Processing Units (GPUs). This new heterogeneous computer architecture composed of multi-core Central Processing Units (CPUs) and many-core GPUs became usual, enabling novel software applications such as self-driving cars, real-time ray tracing, deep learning, and Virtual Reality (VR), which are characterized as stream processing applications. However, this heterogeneous environment poses an additional challenge to software development, which is still in the process of adapting to the parallel processing paradigm on multi-core systems, where programmers are supported by several Application Programming Interfaces (APIs) that offer different abstraction levels. The parallelism exploitation in GPU is done using both CUDA and OpenCL for academia and industry, whose developers have to deal with low-level architecture concepts to efficiently exploit GPU parallelism in their applications. There is still a lack of parallel programming abstractions when: 1) parallelizing code on GPUs, and 2) needing higher-level programming abstractions that deal with both CPU and GPU parallelism. Unfortunately, developers still have to be expert programmers on system and architecture to enable efficient hardware parallelism exploitation in this architectural environment. To contribute to the first problem, we created GSPARLIB, a novel structured parallel programming library for exploiting GPU parallelism that provides a unified programming API and driver-agnostic runtime. It offers Map and Reduce parallel patterns on top of CUDA and OpenCL drivers. We evaluate its performance comparing with state-of-the-art APIs, where the experiments revealed a comparable and efficient performance. For contributing to the second problem, we extended the SPar Domain-Specific Language (DSL), which has been proved to be high-level and productive for expressing stream parallelism with C++ annotations in multi-core CPUs. In this work, we propose and implement new annotations that increase expressiveness to combine the current stream parallelism on CPUs and data parallelism on GPUs. We also provide new pattern-based transformation rules that were implemented in the compiler targeting automatic source-to-source code transformations using GSPARLIB for GPU parallelism exploitation. Our experiments demonstrate that SPar compiler is able to generate stream and data parallel patterns without significant performance penalty compared to handwritten code. Thanks to these advances in SPar, our work is the first on providing high-level C++11 annotations as an API that does not require significant code refactoring in sequential programs while enabling multi-core CPU and many-core GPU parallelism exploitation for stream processing applications.},
keywords = {},
pubstate = {published},
tppubtype = {mastersthesis}
}

The growth and spread of parallel architectures have driven the pursuit of greater computing power with massively parallel hardware such as the Graphics Processing Units (GPUs). This new heterogeneous computer architecture composed of multi-core Central Processing Units (CPUs) and many-core GPUs became usual, enabling novel software applications such as self-driving cars, real-time ray tracing, deep learning, and Virtual Reality (VR), which are characterized as stream processing applications. However, this heterogeneous environment poses an additional challenge to software development, which is still in the process of adapting to the parallel processing paradigm on multi-core systems, where programmers are supported by several Application Programming Interfaces (APIs) that offer different abstraction levels. The parallelism exploitation in GPU is done using both CUDA and OpenCL for academia and industry, whose developers have to deal with low-level architecture concepts to efficiently exploit GPU parallelism in their applications. There is still a lack of parallel programming abstractions when: 1) parallelizing code on GPUs, and 2) needing higher-level programming abstractions that deal with both CPU and GPU parallelism. Unfortunately, developers still have to be expert programmers on system and architecture to enable efficient hardware parallelism exploitation in this architectural environment. To contribute to the first problem, we created GSPARLIB, a novel structured parallel programming library for exploiting GPU parallelism that provides a unified programming API and driver-agnostic runtime. It offers Map and Reduce parallel patterns on top of CUDA and OpenCL drivers. We evaluate its performance comparing with state-of-the-art APIs, where the experiments revealed a comparable and efficient performance. For contributing to the second problem, we extended the SPar Domain-Specific Language (DSL), which has been proved to be high-level and productive for expressing stream parallelism with C++ annotations in multi-core CPUs. In this work, we propose and implement new annotations that increase expressiveness to combine the current stream parallelism on CPUs and data parallelism on GPUs. We also provide new pattern-based transformation rules that were implemented in the compiler targeting automatic source-to-source code transformations using GSPARLIB for GPU parallelism exploitation. Our experiments demonstrate that SPar compiler is able to generate stream and data parallel patterns without significant performance penalty compared to handwritten code. Thanks to these advances in SPar, our work is the first on providing high-level C++11 annotations as an API that does not require significant code refactoring in sequential programs while enabling multi-core CPU and many-core GPU parallelism exploitation for stream processing applications.

2019

Mencagli, Gabriele; Torquati, Massimo; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo

Raising the Parallel Abstraction Level for Streaming Analytics Applications Journal Article doi

IEEE Access, 7 , pp. 131944 - 131961, 2019.

Abstract | Links | BibTeX

@article{MENCAGLI:IEEEAccess:19,
title = {Raising the Parallel Abstraction Level for Streaming Analytics Applications},
author = {Gabriele Mencagli and Massimo Torquati and Dalvan Griebler and Marco Danelutto and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1109/ACCESS.2019.2941183},
doi = {10.1109/ACCESS.2019.2941183},
year = {2019},
date = {2019-09-01},
journal = {IEEE Access},
volume = {7},
pages = {131944 - 131961},
publisher = {IEEE},
abstract = {In the stream processing domain, applications are represented by graphs of operators arbitrarily connected and filled with their business logic code. The APIs of existing Stream Processing Systems (SPSs) ease the development of transformations that recur in the streaming practice (e.g., filtering, aggregation and joins). In contrast, their parallelism abstractions are quite limited since they provide support to stateless operators only, or when the state is organized in a set of key-value pairs. This paper presents how the parallel patterns methodology can be revisited for sliding-window streaming analytics. Our vision fosters a design process of the application as composition and nesting of ready-to-use patterns provided through a C++17 fluent interface. Our prototype implements the run-time system of the patterns in the FastFlow parallel library expressing thread-based parallelism. The experimental analysis shows interesting outcomes. First, our pattern-based approach allows easy prototyping of different versions of the application, and the programmer can leverage nesting of patterns to increase performance (up to 37% in one of the two considered test-bed cases). Second, our FastFlow implementation outperforms (three times faster) the handmade porting of our patterns in popular JVM-based SPSs. Finally, in the concluding part of this paper, we explore the use of a task-based run-time system, by deriving interesting insights into how to make our patterns library suitable for multi backends.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Griebler, Dalvan; Vogel, Adriano; Sensi, Daniele De; Danelutto, Marco; Fernandes, Luiz Gustavo

Simplifying and implementing service level objectives for stream parallelism Journal Article doi

Journal of Supercomputing, pp. 1-26, 2019, ISSN: 0920-8542.

Abstract | Links | BibTeX

@article{GRIEBLER:JS:19,
title = {Simplifying and implementing service level objectives for stream parallelism},
author = {Dalvan Griebler and Adriano Vogel and Daniele De Sensi and Marco Danelutto and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1007/s11227-019-02914-6},
doi = {10.1007/s11227-019-02914-6},
issn = {0920-8542},
year = {2019},
date = {2019-06-01},
journal = {Journal of Supercomputing},
pages = {1-26},
publisher = {Springer},
abstract = {An increasing attention has been given to provide service level objectives (SLOs) in stream processing applications due to the performance and energy requirements, and because of the need to impose limits in terms of resource usage while improving the system utilization. Since the current and next-generation computing systems are intrinsically offering parallel architectures, the software has to naturally exploit the architecture parallelism. Implement and meet SLOs on existing applications is not a trivial task for application programmers, since the software development process, besides the parallelism exploitation, requires the implementation of autonomic algorithms or strategies. This is a system-oriented programming approach and requires the management of multiple knobs and sensors (e.g., the number of threads to use, the clock frequency of the cores, etc.) so that the system can self-adapt at runtime. In this work, we introduce a new and simpler way to define SLO in the application’s source code, by abstracting from the programmer all the details relative to self-adaptive system implementation. The application programmer specifies which parts of the code to parallelize and the related SLOs that should be enforced. To reach this goal, source-to-source code transformation rules are implemented in our compiler, which automatically generates self-adaptive strategies to enforce, at runtime, the user-expressed objectives. The experiments highlighted promising results with simpler, effective, and efficient SLO implementations for real-world applications.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

Vogel, Adriano; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo

Minimizing Self-Adaptation Overhead in Parallel Stream Processing for Multi-Cores Inproceedings doi

Euro-Par 2019: Parallel Processing Workshops, pp. 12, Springer, Göttingen, Germany, 2019.

Abstract | Links | BibTeX

Rockenbach, Dinei A; Stein, Charles Michael; Griebler, Dalvan; Mencagli, Gabriele; Torquati, Massimo; Danelutto, Marco; Fernandes, Luiz Gustavo

Stream Processing on Multi-cores with GPUs: Parallel Programming Models' Challenges Inproceedings doi

International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 834-841, IEEE, Rio de Janeiro, Brazil, 2019.

Abstract | Links | BibTeX

Stein, Charles Michael; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo

Stream Parallelism on the LZSS Data Compression Application for Multi-Cores with GPUs Inproceedings doi

27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 247-251, IEEE, Pavia, Italy, 2019.

Abstract | Links | BibTeX

Maron, Carlos A F; Vogel, Adriano; Griebler, Dalvan; Fernandes, Luiz Gustavo

Should PARSEC Benchmarks be More Parametric? A Case Study with Dedup Inproceedings doi

27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 217-221, IEEE, Pavia, Italy, 2019.

Abstract | Links | BibTeX

@inproceedings{MARON:parametric-parsec:PDP:19,
title = {Should PARSEC Benchmarks be More Parametric? A Case Study with Dedup},
author = {Carlos A. F. Maron and Adriano Vogel and Dalvan Griebler and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1109/EMPDP.2019.8671592},
doi = {10.1109/EMPDP.2019.8671592},
year = {2019},
date = {2019-02-01},
booktitle = {27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)},
pages = {217-221},
publisher = {IEEE},
address = {Pavia, Italy},
series = {PDP'19},
abstract = {Parallel applications of the same domain can present similar patterns of behavior and characteristics. Characterizing common application behaviors can help for understanding performance aspects in the real-world scenario. One way to better understand and evaluate applications' characteristics is by using customizable/parametric benchmarks that enable users to represent important characteristics at run-time. We observed that parameterization techniques should be better exploited in the available benchmarks, especially on stream processing domain. For instance, although widely used, the stream processing benchmarks available in PARSEC do not support the simulation and evaluation of relevant and modern characteristics. Therefore, our goal is to identify the stream parallelism characteristics present in PARSEC. We also implemented a ready to use parameterization support and evaluated the application behaviors considering relevant performance metrics for stream parallelism (service time, throughput, latency). We choose Dedup to be our case study. The experimental results have shown performance improvements in our parameterization support for Dedup. Moreover, this support increased the customization space for benchmark users, which is simple to use. In the future, our solution can be potentially explored on different parallel architectures and parallel programming frameworks.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}

Serpa, Matheus S; Moreira, Francis B; Navaux, Philippe O A; Cruz, Eduardo H M; Diener, Matthias; Griebler, Dalvan; Fernandes, Luiz Gustavo

Memory Performance and Bottlenecks in Multicore and GPU Architectures Inproceedings doi

27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 233-236, IEEE, Pavia, Italy, 2019.

Abstract | Links | BibTeX

@inproceedings{SERPA:memory-gpu-multicore:PDP:19,
title = {Memory Performance and Bottlenecks in Multicore and GPU Architectures},
author = {Matheus S. Serpa and Francis B. Moreira and Philippe O. A. Navaux and Eduardo H. M. Cruz and Matthias Diener and Dalvan Griebler and Luiz Gustavo Fernandes},
url = {https://doi.org/10.1109/EMPDP.2019.8671628},
doi = {10.1109/EMPDP.2019.8671628},
year = {2019},
date = {2019-02-01},
booktitle = {27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)},
pages = {233-236},
publisher = {IEEE},
address = {Pavia, Italy},
series = {PDP'19},
abstract = {Nowadays, there are several different architectures available not only for the industry, but also for normal consumers. Traditional multicore processors, GPUs, accelerators such as the Sunway SW26010, or even energy efficiency-driven processors such as the ARM family, present very different architectural characteristics. This wide range of characteristics presents a challenge for the developers of applications. Developers must deal with different instruction sets, memory hierarchies, or even different programming paradigms when programming for these architectures. Therefore, the same application can perform well when executing on one architecture, but poorly on another architecture. To optimize an application, it is important to have a deep understanding of how it behaves on different architectures. The related work in this area mostly focuses on a limited analysis encompassing execution time and energy. In this paper, we perform a detailed investigation on the impact of the memory subsystem of different architectures, which is one of the most important aspects to be considered. For this study, we performed experiments in the Broadwell CPU and Pascal GPU, using applications from the Rodinia benchmark suite. In this way, we were able to understand why an application performs well on one architecture and poorly on others.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}

Pieper, Ricardo; Griebler, Dalvan; Fernandes, Luiz Gustavo

Structured Stream Parallelism for Rust Inproceedings doi

XXIII Brazilian Symposium on Programming Languages (SBLP), pp. 54-61, ACM, Salvador, Brazil, 2019.

Abstract | Links | BibTeX

174 entries « ‹ 1 of 4 › »