2019 |
Maron, Carlos A F; Griebler, Dalvan; Fernandes, Luiz G Benchmark Paramétrico para o Domínio do Paralelismo de Stream: Um Estudo de Caso com o Ferret da Suíte PARSEC Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 4, Sociedade Brasileira de Computação (SBC), Três de Maio, BR, 2019. @inproceedings{MARON:ERAD:19, title = {Benchmark Paramétrico para o Domínio do Paralelismo de Stream: Um Estudo de Caso com o Ferret da Suíte PARSEC}, author = {Carlos A. F. Maron and Dalvan Griebler and Luiz G. Fernandes}, url = {https://gmap.pucrs.br/dalvan/papers/2019/CR_ERAD_PG_Maron_2019.pdf}, year = {2019}, date = {2019-04-01}, booktitle = {Escola Regional de Alto Desempenho (ERAD/RS)}, pages = {4}, publisher = {Sociedade Brasileira de Computação (SBC)}, address = {Três de Maio, BR}, abstract = {Benchmarks são aplicações sintéticas que servem para avaliar e com-parar o desempenho de sistemas computacionais. Torná-los parametrizáveispode gerar condições diferenciadas de execuções. Porém, a técnica é pouco ex-plorada nos tradicionais e atuais benchmarks. Portanto, esse trabalho avalia oimpacto da parametrização de características do domínio de stream no Ferret.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Benchmarks são aplicações sintéticas que servem para avaliar e com-parar o desempenho de sistemas computacionais. Torná-los parametrizáveispode gerar condições diferenciadas de execuções. Porém, a técnica é pouco ex-plorada nos tradicionais e atuais benchmarks. Portanto, esse trabalho avalia oimpacto da parametrização de características do domínio de stream no Ferret. |
Rista, Cassiano; Griebler, Dalvan; Fernandes, Luiz G Proposta de Grau de Paralelismo Autoadaptativo com MPI-2 para a DSL SPar Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 4, Sociedade Brasileira de Computação (SBC), Três de Maio, BR, 2019. @inproceedings{RISTA:ERAD:19, title = {Proposta de Grau de Paralelismo Autoadaptativo com MPI-2 para a DSL SPar}, author = {Cassiano Rista and Dalvan Griebler and Luiz G. Fernandes}, url = {https://gmap.pucrs.br/dalvan/papers/2019/CR_ERAD_PG_Rista_2019.pdf}, year = {2019}, date = {2019-04-01}, booktitle = {Escola Regional de Alto Desempenho (ERAD/RS)}, pages = {4}, publisher = {Sociedade Brasileira de Computação (SBC)}, address = {Três de Maio, BR}, abstract = {Este artigo apresenta o projeto de um módulo autoadaptativo paracontrole do grau de paralelismo à ser integrado a DSL SPar. O módulo paraaplicações paralelas distribuídas de stream permite a criação de processos emtempo de execução, seleção da política de escalonamento, balanceamento decarga, ordenamento e serialização, adaptando o grau de paralelismo de formaautônoma sem a necessidade de definição de thresholds por parte do usuário.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Este artigo apresenta o projeto de um módulo autoadaptativo paracontrole do grau de paralelismo à ser integrado a DSL SPar. O módulo paraaplicações paralelas distribuídas de stream permite a criação de processos emtempo de execução, seleção da política de escalonamento, balanceamento decarga, ordenamento e serialização, adaptando o grau de paralelismo de formaautônoma sem a necessidade de definição de thresholds por parte do usuário. |
de Araujo, Gabriell A; Griebler, Dalvan; Fernandes, Luiz G Avaliando o Paralelismo de Stream com Pthreads, OpenMP e SPar em Aplicações de Vídeo Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 4, Sociedade Brasileira de Computação (SBC), Três de Maio, BR, 2019. @inproceedings{ARAUJO:stream:ERAD:19, title = {Avaliando o Paralelismo de Stream com Pthreads, OpenMP e SPar em Aplicações de Vídeo}, author = {Gabriell A. de Araujo and Dalvan Griebler and Luiz G. Fernandes}, url = {https://gmap.pucrs.br/dalvan/papers/2019/CR_ERAD_IC_Araujo_2019.pdf}, year = {2019}, date = {2019-04-01}, booktitle = {Escola Regional de Alto Desempenho (ERAD/RS)}, pages = {4}, publisher = {Sociedade Brasileira de Computação (SBC)}, address = {Três de Maio, BR}, abstract = {isando estender os estudos de avaliação daSPar, efetuamos umaanálise comparativa entre SPar,Pthreadse OpenMP em aplicações destream. Os resultados revelam que o desempenho do código paralelo geradopela SPar se equipara com as implementações robustas nas consolidadas bibliotecas Pthreads e OpenMP. Não obstante, também encontramos pontos depossíveis melhorias na SPar.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } isando estender os estudos de avaliação daSPar, efetuamos umaanálise comparativa entre SPar,Pthreadse OpenMP em aplicações destream. Os resultados revelam que o desempenho do código paralelo geradopela SPar se equipara com as implementações robustas nas consolidadas bibliotecas Pthreads e OpenMP. Não obstante, também encontramos pontos depossíveis melhorias na SPar. |
de Araujo, Gabriell A; Griebler, Dalvan; Fernandes, Luiz G Revisando a Programação Paralela com CUDA nos Benchmarks EP e FT Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 4, Sociedade Brasileira de Computação (SBC), Três de Maio, BR, 2019. @inproceedings{ARAUJO:gpu:ERAD:19, title = {Revisando a Programação Paralela com CUDA nos Benchmarks EP e FT}, author = {Gabriell A. de Araujo and Dalvan Griebler and Luiz G. Fernandes}, url = {https://gmap.pucrs.br/dalvan/papers/2019/CR_ERAD_IC_Gabriell_2019.pdf}, year = {2019}, date = {2019-04-01}, booktitle = {Escola Regional de Alto Desempenho (ERAD/RS)}, pages = {4}, publisher = {Sociedade Brasileira de Computação (SBC)}, address = {Três de Maio, BR}, abstract = {Este trabalho visa estender os estudos sobre o NAS Parallel Ben-chmarks (NPB), os quais possuem lacunas relevantes no contexto de GPUs.Os principais trabalhos da literatura consistem em implementações antigas,abrindo margens para possíveis questionamentos. Nessa direção, foram rea-lizados novos estudos de paralelização para GPUs das aplicações EP e FT. Osresultados foram similares ou melhores que o estado-da-arte.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Este trabalho visa estender os estudos sobre o NAS Parallel Ben-chmarks (NPB), os quais possuem lacunas relevantes no contexto de GPUs.Os principais trabalhos da literatura consistem em implementações antigas,abrindo margens para possíveis questionamentos. Nessa direção, foram rea-lizados novos estudos de paralelização para GPUs das aplicações EP e FT. Osresultados foram similares ou melhores que o estado-da-arte. |
Justo, Gabriel B; Vogel, Adriano; Griebler, Dalvan; Fernandes, Luiz G Acelerando o Reconhecimento de Pessoas em Vídeos com MPI Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 4, Sociedade Brasileira de Computação (SBC), Três de Maio, BR, 2019. @inproceedings{JUSTO:ERAD:19, title = {Acelerando o Reconhecimento de Pessoas em Vídeos com MPI}, author = {Gabriel B. Justo and Adriano Vogel and Dalvan Griebler and Luiz G. Fernandes}, url = {https://gmap.pucrs.br/dalvan/papers/2019/CR_ERAD_IC_Justo_2019.pdf}, year = {2019}, date = {2019-04-01}, booktitle = {Escola Regional de Alto Desempenho (ERAD/RS)}, pages = {4}, publisher = {Sociedade Brasileira de Computação (SBC)}, address = {Três de Maio, BR}, abstract = {Diversas aplicações de processamento de vídeo demandam paralelismo para aumentar o desempenho. O objetivo deste trabalho é implementar etestar versões com processamento distribuído em aplicações de reconhecimentofacial em vídeos. As implementações foram avaliadas quanto ao seu desempe-nho. Os resultados mostraram que essas aplicações podem ter uma aceleraçãosignificativa em ambientes distribuídos.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Diversas aplicações de processamento de vídeo demandam paralelismo para aumentar o desempenho. O objetivo deste trabalho é implementar etestar versões com processamento distribuído em aplicações de reconhecimentofacial em vídeos. As implementações foram avaliadas quanto ao seu desempe-nho. Os resultados mostraram que essas aplicações podem ter uma aceleraçãosignificativa em ambientes distribuídos. |
Rockenbach, Dinei A; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo High-Level Stream Parallelism Abstractions with SPar Targeting GPUs Inproceedings doi Parallel Computing is Everywhere, Proceedings of the International Conference on Parallel Computing (ParCo), pp. 543-552, IOS Press, Prague, Czech Republic, 2019. @inproceedings{ROCKENBACH:PARCO:19, title = {High-Level Stream Parallelism Abstractions with SPar Targeting GPUs}, author = {Dinei A. Rockenbach and Dalvan Griebler and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.3233/APC200083}, doi = {10.3233/APC200083}, year = {2019}, date = {2019-09-01}, booktitle = {Parallel Computing is Everywhere, Proceedings of the International Conference on Parallel Computing (ParCo)}, volume = {36}, pages = {543-552}, publisher = {IOS Press}, address = {Prague, Czech Republic}, series = {ParCo'19}, abstract = {The combined exploitation of stream and data parallelism is demonstrating encouraging performance results in the literature for heterogeneous architectures, which are present on every computer systems today. However, provide parallel software efficiently targeting those architectures requires significant programming effort and expertise. The SPar domain-specific language already represents a solution to this problem providing proven high-level programming abstractions for multi-core architectures. In this paper, we enrich the SPar language adding support for GPUs. New transformation rules are designed for generating parallel code using stream and data parallel patterns. Our experiments revealed that these transformations rules are able to improve performance while the high-level programming abstractions are maintained.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The combined exploitation of stream and data parallelism is demonstrating encouraging performance results in the literature for heterogeneous architectures, which are present on every computer systems today. However, provide parallel software efficiently targeting those architectures requires significant programming effort and expertise. The SPar domain-specific language already represents a solution to this problem providing proven high-level programming abstractions for multi-core architectures. In this paper, we enrich the SPar language adding support for GPUs. New transformation rules are designed for generating parallel code using stream and data parallel patterns. Our experiments revealed that these transformations rules are able to improve performance while the high-level programming abstractions are maintained. |
Vogel, Adriano; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo Seamless Parallelism Management for Multi-core Stream Processing Inproceedings doi Advances in Parallel Computing, Proceedings of the International Conference on Parallel Computing (ParCo), pp. 533-542, IOS Press, Prague, Czech Republic, 2019. @inproceedings{VOGEL:PARCO:19, title = {Seamless Parallelism Management for Multi-core Stream Processing}, author = {Adriano Vogel and Dalvan Griebler and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.3233/APC200082}, doi = {10.3233/APC200082}, year = {2019}, date = {2019-09-01}, booktitle = {Advances in Parallel Computing, Proceedings of the International Conference on Parallel Computing (ParCo)}, volume = {36}, pages = {533-542}, publisher = {IOS Press}, address = {Prague, Czech Republic}, series = {ParCo'19}, abstract = {Video streaming applications have critical performance requirements for dealing with fluctuating workloads and providing results in real-time. As a consequence, the majority of these applications demand parallelism for delivering quality of service to users. Although high-level and structured parallel programming aims at facilitating parallelism exploitation, there are still several issues to be addressed for increasing/improving existing parallel programming abstractions. In this paper, we aim at employing self-adaptivity for stream processing in order to seamlessly manage the application parallelism configurations at run-time, where a new strategy alleviates from application programmers the need to set time-consuming and error-prone parallelism parameters. The new strategy was implemented and validated on SPar. The results have shown that the proposed solution increases the level of abstraction and achieved a competitive performance.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Video streaming applications have critical performance requirements for dealing with fluctuating workloads and providing results in real-time. As a consequence, the majority of these applications demand parallelism for delivering quality of service to users. Although high-level and structured parallel programming aims at facilitating parallelism exploitation, there are still several issues to be addressed for increasing/improving existing parallel programming abstractions. In this paper, we aim at employing self-adaptivity for stream processing in order to seamlessly manage the application parallelism configurations at run-time, where a new strategy alleviates from application programmers the need to set time-consuming and error-prone parallelism parameters. The new strategy was implemented and validated on SPar. The results have shown that the proposed solution increases the level of abstraction and achieved a competitive performance. |
Pieper, Ricardo; Griebler, Dalvan; Fernandes, Luiz Gustavo Structured Stream Parallelism for Rust Inproceedings doi XXIII Brazilian Symposium on Programming Languages (SBLP), pp. 54-61, ACM, Salvador, Brazil, 2019. @inproceedings{PIEPER:SBLP:19b, title = {Structured Stream Parallelism for Rust}, author = {Ricardo Pieper and Dalvan Griebler and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1145/3355378.3355384}, doi = {10.1145/3355378.3355384}, year = {2019}, date = {2019-10-01}, booktitle = {XXIII Brazilian Symposium on Programming Languages (SBLP)}, pages = {54-61}, publisher = {ACM}, address = {Salvador, Brazil}, series = {SBLP'19}, abstract = {Structured parallel programming has been studied and applied in several programming languages. This approach has proven to be suitable for abstracting low-level and architecture-dependent parallelism implementations. Our goal is to provide a structured and high-level library for the Rust language, targeting parallel stream processing applications for multi-core servers. Rust is an emerging programming language that has been developed by Mozilla Research group, focusing on performance, memory safety, and thread-safety. However, it lacks parallel programming abstractions, especially for stream processing applications. This paper contributes to a new API based on the structured parallel programming approach to simplify parallel software developing. Our experiments highlight that our solution provides higher-level parallel programming abstractions for stream processing applications in Rust. We also show that the throughput and speedup are comparable to the state-of-the-art for certain workloads.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Structured parallel programming has been studied and applied in several programming languages. This approach has proven to be suitable for abstracting low-level and architecture-dependent parallelism implementations. Our goal is to provide a structured and high-level library for the Rust language, targeting parallel stream processing applications for multi-core servers. Rust is an emerging programming language that has been developed by Mozilla Research group, focusing on performance, memory safety, and thread-safety. However, it lacks parallel programming abstractions, especially for stream processing applications. This paper contributes to a new API based on the structured parallel programming approach to simplify parallel software developing. Our experiments highlight that our solution provides higher-level parallel programming abstractions for stream processing applications in Rust. We also show that the throughput and speedup are comparable to the state-of-the-art for certain workloads. |
Vogel, Adriano; Griebler, Dalvan; Danelutto, Marco; Fernandes, Luiz Gustavo Minimizing Self-Adaptation Overhead in Parallel Stream Processing for Multi-Cores Inproceedings doi Euro-Par 2019: Parallel Processing Workshops, pp. 12, Springer, Göttingen, Germany, 2019. @inproceedings{VOGEL:adaptive-overhead:AutoDaSP:19, title = {Minimizing Self-Adaptation Overhead in Parallel Stream Processing for Multi-Cores}, author = {Adriano Vogel and Dalvan Griebler and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1007/978-3-030-48340-1_3}, doi = {10.1007/978-3-030-48340-1_3}, year = {2019}, date = {2019-08-01}, booktitle = {Euro-Par 2019: Parallel Processing Workshops}, volume = {11997}, pages = {12}, publisher = {Springer}, address = {Göttingen, Germany}, series = {Lecture Notes in Computer Science}, abstract = {Stream processing paradigm is present in several applications that apply computations over continuous data flowing in the form of streams (e.g., video feeds, image, and data analytics). Employing self-adaptivity to stream processing applications can provide higher-level programming abstractions and autonomic resource management. However, there are cases where the performance is suboptimal. In this paper, the goal is to optimize parallelism adaptations in terms of stability and accuracy, which can improve the performance of parallel stream processing applications. Therefore, we present a new optimized self-adaptive strategy that is experimentally evaluated. The proposed solution provided high-level programming abstractions, reduced the adaptation overhead, and achieved a competitive performance with the best static executions.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Stream processing paradigm is present in several applications that apply computations over continuous data flowing in the form of streams (e.g., video feeds, image, and data analytics). Employing self-adaptivity to stream processing applications can provide higher-level programming abstractions and autonomic resource management. However, there are cases where the performance is suboptimal. In this paper, the goal is to optimize parallelism adaptations in terms of stability and accuracy, which can improve the performance of parallel stream processing applications. Therefore, we present a new optimized self-adaptive strategy that is experimentally evaluated. The proposed solution provided high-level programming abstractions, reduced the adaptation overhead, and achieved a competitive performance with the best static executions. |
2018 |
Griebler, Dalvan; Hoffmann, Renato B; Danelutto, Marco; Fernandes, Luiz Gustavo High-Level and Productive Stream Parallelism for Dedup, Ferret, and Bzip2 Journal Article doi International Journal of Parallel Programming, 47 (1), pp. 253-271, 2018, ISSN: 1573-7640. @article{GRIEBLER:IJPP:18, title = {High-Level and Productive Stream Parallelism for Dedup, Ferret, and Bzip2}, author = {Dalvan Griebler and Renato B. Hoffmann and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1007/s10766-018-0558-x}, doi = {10.1007/s10766-018-0558-x}, issn = {1573-7640}, year = {2018}, date = {2018-02-01}, journal = {International Journal of Parallel Programming}, volume = {47}, number = {1}, pages = {253-271}, publisher = {Springer}, abstract = {Parallel programming has been a challenging task for application programmers. Stream processing is an application domain present in several scientific, enterprise, and financial areas that lack suitable abstractions to exploit parallelism. Our goal is to assess the feasibility of state-of-the-art frameworks/libraries (Pthreads, TBB, and FastFlow) and the SPar domain-specific language for real-world streaming applications (Dedup, Ferret, and Bzip2) targeting multi-core architectures. SPar was specially designed to provide high-level and productive stream parallelism abstractions, supporting programmers with standard C++-11 annotations. For the experiments, we implemented three streaming applications. We discussed SPar’s programmability advantages compared to the frameworks in terms of productivity and structured parallel programming. The results demonstrate that SPar improves productivity and provides the necessary features to achieve similar performances compared to the state-of-the-art.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Parallel programming has been a challenging task for application programmers. Stream processing is an application domain present in several scientific, enterprise, and financial areas that lack suitable abstractions to exploit parallelism. Our goal is to assess the feasibility of state-of-the-art frameworks/libraries (Pthreads, TBB, and FastFlow) and the SPar domain-specific language for real-world streaming applications (Dedup, Ferret, and Bzip2) targeting multi-core architectures. SPar was specially designed to provide high-level and productive stream parallelism abstractions, supporting programmers with standard C++-11 annotations. For the experiments, we implemented three streaming applications. We discussed SPar’s programmability advantages compared to the frameworks in terms of productivity and structured parallel programming. The results demonstrate that SPar improves productivity and provides the necessary features to achieve similar performances compared to the state-of-the-art. |
Griebler, Dalvan; Hoffmann, Renato B; Danelutto, Marco; Fernandes, Luiz Gustavo Stream Parallelism with Ordered Data Constraints on Multi-Core Systems Journal Article doi Journal of Supercomputing, 75 (8), pp. 4042-4061, 2018, ISSN: 0920-8542. @article{GRIEBLER:JS:18, title = {Stream Parallelism with Ordered Data Constraints on Multi-Core Systems}, author = {Dalvan Griebler and Renato B. Hoffmann and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1007/s11227-018-2482-7}, doi = {10.1007/s11227-018-2482-7}, issn = {0920-8542}, year = {2018}, date = {2018-07-01}, journal = {Journal of Supercomputing}, volume = {75}, number = {8}, pages = {4042-4061}, publisher = {Springer}, abstract = {It is often a challenge to keep input/output tasks/results in order for parallel computations ver data streams, particularly when stateless task operators are replicated to increase parallelism when there are irregular tasks. Maintaining input/output order requires additional coding effort and may significantly impact the application's actual throughput. Thus, we propose a new implementation technique designed to be easily integrated with any of the existing C++ parallel programming frameworks that support stream parallelism. In this paper, it is first implemented and studied using SPar, our high-level domain-specific language for stream parallelism. We discuss the results of a set of experiments with real-world applications revealing how significant performance improvements may be achieved when our proposed solution is integrated within SPar, especially for data compression applications. Also, we show the results of experiments performed after integrating our solution within FastFlow and TBB, revealing no significant overheads.}, keywords = {}, pubstate = {published}, tppubtype = {article} } It is often a challenge to keep input/output tasks/results in order for parallel computations ver data streams, particularly when stateless task operators are replicated to increase parallelism when there are irregular tasks. Maintaining input/output order requires additional coding effort and may significantly impact the application's actual throughput. Thus, we propose a new implementation technique designed to be easily integrated with any of the existing C++ parallel programming frameworks that support stream parallelism. In this paper, it is first implemented and studied using SPar, our high-level domain-specific language for stream parallelism. We discuss the results of a set of experiments with real-world applications revealing how significant performance improvements may be achieved when our proposed solution is integrated within SPar, especially for data compression applications. Also, we show the results of experiments performed after integrating our solution within FastFlow and TBB, revealing no significant overheads. |
Griebler, Dalvan; Loff, Junior; Mencagli, Gabriele; Danelutto, Marco; Fernandes, Luiz Gustavo Efficient NAS Benchmark Kernels with C++ Parallel Programming Inproceedings doi 26th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 733-740, IEEE, Cambridge, UK, 2018. @inproceedings{GRIEBLER:NAS-CPP:PDP:18, title = {Efficient NAS Benchmark Kernels with C++ Parallel Programming}, author = {Dalvan Griebler and Junior Loff and Gabriele Mencagli and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1109/PDP2018.2018.00120}, doi = {10.1109/PDP2018.2018.00120}, year = {2018}, date = {2018-03-01}, booktitle = {26th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)}, pages = {733-740}, publisher = {IEEE}, address = {Cambridge, UK}, series = {PDP'18}, abstract = {Benchmarking is a way to study the performance of new architectures and parallel programming frameworks. Well-established benchmark suites such as the NAS Parallel Benchmarks (NPB) comprise legacy codes that still lack portability to C++ language. As a consequence, a set of high-level and easy-to-use C++ parallel programming frameworks cannot be tested in NPB. Our goal is to describe a C++ porting of the NPB kernels and to analyze the performance achieved by different parallel implementations written using the Intel TBB, OpenMP and FastFlow frameworks for Multi-Cores. The experiments show an efficient code porting from Fortran to C++ and an efficient parallelization on average.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Benchmarking is a way to study the performance of new architectures and parallel programming frameworks. Well-established benchmark suites such as the NAS Parallel Benchmarks (NPB) comprise legacy codes that still lack portability to C++ language. As a consequence, a set of high-level and easy-to-use C++ parallel programming frameworks cannot be tested in NPB. Our goal is to describe a C++ porting of the NPB kernels and to analyze the performance achieved by different parallel implementations written using the Intel TBB, OpenMP and FastFlow frameworks for Multi-Cores. The experiments show an efficient code porting from Fortran to C++ and an efficient parallelization on average. |
Griebler, Dalvan; Sensi, Daniele De; Vogel, Adriano; Danelutto, Marco; Fernandes, Luiz Gustavo Service Level Objectives via C++11 Attributes Inproceedings doi Euro-Par 2018: Parallel Processing Workshops, pp. 745-756, Springer, Turin, Italy, 2018. @inproceedings{GRIEBLER:SLO-SPar-Nornir:REPARA:18, title = {Service Level Objectives via C++11 Attributes}, author = {Dalvan Griebler and Daniele De Sensi and Adriano Vogel and Marco Danelutto and Luiz Gustavo Fernandes}, url = {http://dx.doi.org/10.1007/978-3-030-10549-5_58}, doi = {10.1007/978-3-030-10549-5_58}, year = {2018}, date = {2018-08-01}, booktitle = {Euro-Par 2018: Parallel Processing Workshops}, pages = {745-756}, publisher = {Springer}, address = {Turin, Italy}, series = {Lecture Notes in Computer Science}, abstract = {In recent years, increasing attention has been given to the possibility of guaranteeing Service Level Objectives (SLOs) to users about their applications, either regarding performance or power consumption. SLO can be implemented for parallel applications since they can provide many control knobs (e.g., the number of threads to use, the clock frequency of the cores, etc.) to tune the performance and power consumption of the application. Different from most of the existing approaches, we target sequential stream processing applications by proposing a solution based on C++ annotations. The user specifies which parts of the code to parallelize and what type of requirements should be enforced on that part of the code. Our solution first automatically parallelizes the annotated code and then applies self-adaptation approaches at run-time to enforce the user-expressed objectives. We ran experiments on different real-world applications, showing its simplicity and effectiveness.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } In recent years, increasing attention has been given to the possibility of guaranteeing Service Level Objectives (SLOs) to users about their applications, either regarding performance or power consumption. SLO can be implemented for parallel applications since they can provide many control knobs (e.g., the number of threads to use, the clock frequency of the cores, etc.) to tune the performance and power consumption of the application. Different from most of the existing approaches, we target sequential stream processing applications by proposing a solution based on C++ annotations. The user specifies which parts of the code to parallelize and what type of requirements should be enforced on that part of the code. Our solution first automatically parallelizes the annotated code and then applies self-adaptation approaches at run-time to enforce the user-expressed objectives. We ran experiments on different real-world applications, showing its simplicity and effectiveness. |
Vogel, Adriano; Griebler, Dalvan; Sensi, Daniele De; Danelutto, Marco; Fernandes, Luiz Gustavo Autonomic and Latency-Aware Degree of Parallelism Management in SPar Inproceedings doi Euro-Par 2018: Parallel Processing Workshops, pp. 28-39, Springer, Turin, Italy, 2018. @inproceedings{VOGEL:Adaptive-Latency-SPar:AutoDaSP:18, title = {Autonomic and Latency-Aware Degree of Parallelism Management in SPar}, author = {Adriano Vogel and Dalvan Griebler and Daniele De Sensi and Marco Danelutto and Luiz Gustavo Fernandes}, url = {http://dx.doi.org/10.1007/978-3-030-10549-5_3}, doi = {10.1007/978-3-030-10549-5_3}, year = {2018}, date = {2018-08-01}, booktitle = {Euro-Par 2018: Parallel Processing Workshops}, pages = {28-39}, publisher = {Springer}, address = {Turin, Italy}, series = {Lecture Notes in Computer Science}, abstract = {Stream processing applications became a representative workload in current computing systems. A significant part of these applications demands parallelism to increase performance. However, programmers are often facing a trade-off between coding productivity and performance when introducing parallelism. SPar was created for balancing this trade-off to the application programmers by using the C++11 attributes’ annotation mechanism. In SPar and other programming frameworks for stream processing applications, the manual definition of the number of replicas to be used for the stream operators is a challenge. In addition to that, low latency is required by several stream processing applications. We noted that explicit latency requirements are poorly considered on the state-of-the-art parallel programming frameworks. Since there is a direct relationship between the number of replicas and the latency of the application, in this work we propose an autonomic and adaptive strategy to choose the proper number of replicas in SPar to address latency constraints. We experimentally evaluated our implemented strategy and demonstrated its effectiveness on a real-world application, demonstrating that our adaptive strategy can provide higher abstraction levels while automatically managing the latency.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Stream processing applications became a representative workload in current computing systems. A significant part of these applications demands parallelism to increase performance. However, programmers are often facing a trade-off between coding productivity and performance when introducing parallelism. SPar was created for balancing this trade-off to the application programmers by using the C++11 attributes’ annotation mechanism. In SPar and other programming frameworks for stream processing applications, the manual definition of the number of replicas to be used for the stream operators is a challenge. In addition to that, low latency is required by several stream processing applications. We noted that explicit latency requirements are poorly considered on the state-of-the-art parallel programming frameworks. Since there is a direct relationship between the number of replicas and the latency of the application, in this work we propose an autonomic and adaptive strategy to choose the proper number of replicas in SPar to address latency constraints. We experimentally evaluated our implemented strategy and demonstrated its effectiveness on a real-world application, demonstrating that our adaptive strategy can provide higher abstraction levels while automatically managing the latency. |
Ewald, Endrius; Vogel, Adriano; Rista, Cassiano; Griebler, Dalvan; Manssour, Isabel; Fernandes, Luiz G Parallel and Distributed Processing Support for a Geospatial Data Visualization DSL Inproceedings doi Symposium on High Performance Computing Systems (WSCAD), pp. 221-228, IEEE, São Paulo, Brazil, 2018. @inproceedings{EWALD:WSCAD:18, title = {Parallel and Distributed Processing Support for a Geospatial Data Visualization DSL}, author = {Endrius Ewald and Adriano Vogel and Cassiano Rista and Dalvan Griebler and Isabel Manssour and Luiz G. Fernandes}, url = {https://doi.org/10.1109/WSCAD.2018.00042}, doi = {10.1109/WSCAD.2018.00042}, year = {2018}, date = {2018-10-01}, booktitle = {Symposium on High Performance Computing Systems (WSCAD)}, pages = {221-228}, publisher = {IEEE}, address = {São Paulo, Brazil}, abstract = {The amount of data generated worldwide related to geolocalization has exponentially increased. However, the fast processing of this amount of data is a challenge from the programming perspective, and many available solutions require learning a variety of tools and programming languages. This paper introduces the support for parallel and distributed processing in a DSL for Geospatial Data Visualization to speed up the data pre-processing phase. The results have shown the MPI version with dynamic data distribution performing better under medium and large data set files, while MPI-I/O version achieved the best performance with small data set files.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The amount of data generated worldwide related to geolocalization has exponentially increased. However, the fast processing of this amount of data is a challenge from the programming perspective, and many available solutions require learning a variety of tools and programming languages. This paper introduces the support for parallel and distributed processing in a DSL for Geospatial Data Visualization to speed up the data pre-processing phase. The results have shown the MPI version with dynamic data distribution performing better under medium and large data set files, while MPI-I/O version achieved the best performance with small data set files. |
Vogel, Adriano; Fernandes, Luiz G Grau de Paralelismo Adaptativo na DSL SPar Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 2, Sociedade Brasileira de Computação (SBC), Porto Alegre, BR, 2018. @inproceedings{VOGEL:ERAD:18, title = {Grau de Paralelismo Adaptativo na DSL SPar}, author = {Adriano Vogel and Luiz G. Fernandes}, url = {https://sol.sbc.org.br/index.php/eradrs/article/view/4698/4615}, year = {2018}, date = {2018-04-01}, booktitle = {Escola Regional de Alto Desempenho (ERAD/RS)}, pages = {2}, publisher = {Sociedade Brasileira de Computação (SBC)}, address = {Porto Alegre, BR}, abstract = {As aplicações de stream apresentam características que as diferem de outras classes de aplicações, como variação nas entradas/saídas e execuções por períodos indefinidos de tempo. Uma das formas de responder a natureza dinâmica dessas aplicações é adaptando continuamente o grau de paralelismo. Nesse estudo é apresentado o suporte ao grau de paralelismo adaptativo na DSL (Domain-Specific Language) SPar.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } As aplicações de stream apresentam características que as diferem de outras classes de aplicações, como variação nas entradas/saídas e execuções por períodos indefinidos de tempo. Uma das formas de responder a natureza dinâmica dessas aplicações é adaptando continuamente o grau de paralelismo. Nesse estudo é apresentado o suporte ao grau de paralelismo adaptativo na DSL (Domain-Specific Language) SPar. |
Maron, Carlos A F; Fernandes, Luiz G Uma Suíte de Benchmarks Parametrizáveis para o Domínio de Processamento de Stream em Sistemas Multi-Core Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 2, Sociedade Brasileira de Computação (SBC), Porto Alegre, BR, 2018. @inproceedings{MARON:ERAD:18, title = {Uma Suíte de Benchmarks Parametrizáveis para o Domínio de Processamento de Stream em Sistemas Multi-Core}, author = {Carlos A. F. Maron and Luiz G. Fernandes}, url = {https://sol.sbc.org.br/index.php/eradrs/article/view/4723/4640}, year = {2018}, date = {2018-04-01}, booktitle = {Escola Regional de Alto Desempenho (ERAD/RS)}, pages = {2}, publisher = {Sociedade Brasileira de Computação (SBC)}, address = {Porto Alegre, BR}, abstract = {Avaliar o desempenho é importante para computação. Porém, assim como o hardware, o software também deve ser avaliado quando características podem influenciar no seu comportamento. Nestes casos, a suíte de benchmarks parametrizáveis para o processamento de stream serve como uma ferramenta de apoio ao usuário e até programadores.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Avaliar o desempenho é importante para computação. Porém, assim como o hardware, o software também deve ser avaliado quando características podem influenciar no seu comportamento. Nestes casos, a suíte de benchmarks parametrizáveis para o processamento de stream serve como uma ferramenta de apoio ao usuário e até programadores. |
Rista, Cassiano; Fernandes, Luiz G Proposta de Provisionamento Elástico de Recursos com MPI-2 para a DSL SPar Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 2, Sociedade Brasileira de Computação (SBC), Porto Alegre, BR, 2018. @inproceedings{RISTA:ERAD:18, title = {Proposta de Provisionamento Elástico de Recursos com MPI-2 para a DSL SPar}, author = {Cassiano Rista and Luiz G. Fernandes}, url = {https://sol.sbc.org.br/index.php/eradrs/article/view/4709/4626}, year = {2018}, date = {2018-04-01}, booktitle = {Escola Regional de Alto Desempenho (ERAD/RS)}, pages = {2}, publisher = {Sociedade Brasileira de Computação (SBC)}, address = {Porto Alegre, BR}, abstract = {Este artigo apresenta uma proposta para desenvolvimento de um módulo de provisionamento elástico e autônomo a ser integrado em uma linguagem especifica de domínio (DSL) voltada para o paralelismo de stream. O módulo deverá explorar a elasticidade como uso de MPI-2 em um ambiente de cluster de computadores, permitindo a criação de processos em tempo de execução, serialização, ordenamento e balanceamento de carga.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Este artigo apresenta uma proposta para desenvolvimento de um módulo de provisionamento elástico e autônomo a ser integrado em uma linguagem especifica de domínio (DSL) voltada para o paralelismo de stream. O módulo deverá explorar a elasticidade como uso de MPI-2 em um ambiente de cluster de computadores, permitindo a criação de processos em tempo de execução, serialização, ordenamento e balanceamento de carga. |
Bairros, Gildomiro; Fernandes, Luiz G Suporte para Computação Autonômica com Elasticidade Vertical para a DSL SPar Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 2, Sociedade Brasileira de Computação (SBC), Porto Alegre, BR, 2018. @inproceedings{BAIRROS:ERAD:18, title = {Suporte para Computação Autonômica com Elasticidade Vertical para a DSL SPar}, author = {Gildomiro Bairros and Luiz G. Fernandes}, url = {https://sol.sbc.org.br/index.php/eradrs/article/view/4716/4633}, year = {2018}, date = {2018-04-01}, booktitle = {Escola Regional de Alto Desempenho (ERAD/RS)}, pages = {2}, publisher = {Sociedade Brasileira de Computação (SBC)}, address = {Porto Alegre, BR}, abstract = {O objetivo deste trabalho é propor uma solução para o suporte de elasticidade vertical em aplicações de processamento de stream desenvolvidas com a SPar. Trata-se de uma linguagem específica de domínio para expressar paralelismo de stream em alto nível. Nossa fornece suporte a elasticidade automática para ambientes em Linux Contêineres e rotinas que abstraem detalhes da infraestrutura de nuvem através da VEL (Vertical Elasticity Library)}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } O objetivo deste trabalho é propor uma solução para o suporte de elasticidade vertical em aplicações de processamento de stream desenvolvidas com a SPar. Trata-se de uma linguagem específica de domínio para expressar paralelismo de stream em alto nível. Nossa fornece suporte a elasticidade automática para ambientes em Linux Contêineres e rotinas que abstraem detalhes da infraestrutura de nuvem através da VEL (Vertical Elasticity Library) |
Loff, Junior; Griebler, Dalvan; Sandes, Edans; Melo, Alba; Fernandes, Luiz G Suporte ao Paralelismo Multi-Core com FastFlow e TBB em uma Aplicação de Alinhamento de Sequências de DNA Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 2, Sociedade Brasileira de Computação (SBC), Porto Alegre, BR, 2018. @inproceedings{LOFF:ERAD:18, title = {Suporte ao Paralelismo Multi-Core com FastFlow e TBB em uma Aplicação de Alinhamento de Sequências de DNA}, author = {Junior Loff and Dalvan Griebler and Edans Sandes and Alba Melo and Luiz G. Fernandes}, url = {https://gmap.pucrs.br/dalvan/papers/2018/CR_ERAD_IC_Loff_2018.pdf}, year = {2018}, date = {2018-04-01}, booktitle = {Escola Regional de Alto Desempenho (ERAD/RS)}, pages = {2}, publisher = {Sociedade Brasileira de Computação (SBC)}, address = {Porto Alegre, BR}, abstract = {Quando uma sequência biológica é obtida, é comum alinhá-la com outra já estudada para determinar suas características. O desafio é processar este alinhamento em tempo útil. Neste trabalho exploramos o paralelismo em uma aplicação de alinhamento de sequências de DNA utilizando as bibliotecas FastFlow e Intel TBB. Os experimentos mostram que a versão TBB obteve até 4% melhor tempo de execução em comparação à versão original em OpenMP.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Quando uma sequência biológica é obtida, é comum alinhá-la com outra já estudada para determinar suas características. O desafio é processar este alinhamento em tempo útil. Neste trabalho exploramos o paralelismo em uma aplicação de alinhamento de sequências de DNA utilizando as bibliotecas FastFlow e Intel TBB. Os experimentos mostram que a versão TBB obteve até 4% melhor tempo de execução em comparação à versão original em OpenMP. |
Hoffmann, Renato B; Griebler, Dalvan; Fernandes, Luiz G Paralelização de uma Aplicação de Detecção e Eliminação de Ruídos em Streaming de Vídeo com a DSL SPar Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 2, Sociedade Brasileira de Computação (SBC), Porto Alegre, BR, 2018. @inproceedings{HOFFMANN:ERAD:18, title = {Paralelização de uma Aplicação de Detecção e Eliminação de Ruídos em Streaming de Vídeo com a DSL SPar}, author = {Renato B. Hoffmann and Dalvan Griebler and Luiz G. Fernandes}, url = {https://gmap.pucrs.br/dalvan/papers/2018/CR_ERAD_IC_Hoffmann_2018.pdf}, year = {2018}, date = {2018-04-01}, booktitle = {Escola Regional de Alto Desempenho (ERAD/RS)}, pages = {2}, publisher = {Sociedade Brasileira de Computação (SBC)}, address = {Porto Alegre, BR}, abstract = {Restauração de imagem é uma importante etapa de qualquer sistema de computação gráfica. Este trabalho tem como objetivo apresentar e avaliar o paralelismo de Denoiser, uma aplicação para detecção e eliminação de ruído em streaming de vídeo. Foram avaliados o speed-up e programabilidade das interfaces SPar, Thread Building Blocks e FastFlow. Os resultados mostram que a SPar obteve bons resultados de programabilidade e desempenho.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Restauração de imagem é uma importante etapa de qualquer sistema de computação gráfica. Este trabalho tem como objetivo apresentar e avaliar o paralelismo de Denoiser, uma aplicação para detecção e eliminação de ruído em streaming de vídeo. Foram avaliados o speed-up e programabilidade das interfaces SPar, Thread Building Blocks e FastFlow. Os resultados mostram que a SPar obteve bons resultados de programabilidade e desempenho. |
Ewald, Endrius; Vogel, Adriano; Griebler, Dalvan; Manssour, Isabel; Fernandes, Luiz Gustavo Suporte ao Processamento Paralelo e Distribuído em uma DSL para Visualização de Dados Geoespaciais Inproceedings XIX Simpósio em Sistemas Computacionais de Alto Desempenho, pp. 1-12, SBC, São Paulo, Brazil, 2018. @inproceedings{EWALD:WSCAD:18b, title = {Suporte ao Processamento Paralelo e Distribuído em uma DSL para Visualização de Dados Geoespaciais}, author = {Endrius Ewald and Adriano Vogel and Dalvan Griebler and Isabel Manssour and Luiz Gustavo Fernandes}, url = {https://gmap.pucrs.br/gmap/files/publications/articles/9f9c9dc7d5d4eaf8bb379c4bef8e00cb.pdf}, year = {2018}, date = {2018-10-01}, booktitle = {XIX Simpósio em Sistemas Computacionais de Alto Desempenho}, pages = {1-12}, publisher = {SBC}, address = {São Paulo, Brazil}, abstract = {The amount of data generated worldwide related to geolocalization has exponentially increased. However, the fast processing of this amount of data is a challenge from the programming perspective, and many available solutions require learning a variety of tools and programming languages. This paper introduces the support for parallel and distributed processing in a DSL for Geospatial Data Visualization to speed up the data pre-processing phase. The results have shown the MPI version with dynamic data distribution performing better under medium and large data set files, while MPI-I/O version achieved the best performance with small data set files.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The amount of data generated worldwide related to geolocalization has exponentially increased. However, the fast processing of this amount of data is a challenge from the programming perspective, and many available solutions require learning a variety of tools and programming languages. This paper introduces the support for parallel and distributed processing in a DSL for Geospatial Data Visualization to speed up the data pre-processing phase. The results have shown the MPI version with dynamic data distribution performing better under medium and large data set files, while MPI-I/O version achieved the best performance with small data set files. |
Griebler, Dalvan; Vogel, Adriano; Maron, Carlos A F; Maliszewski, Anderson M; Schepke, Claudio; Fernandes, Luiz Gustavo Performance of Data Mining, Media, and Financial Applications under Private Cloud Conditions Inproceedings doi 23rd IEEE Symposium on Computers and Communications (ISCC), IEEE, Natal, Brazil, 2018. @inproceedings{parsec_cloudstack_lxc_kvm:ISCC:2018, title = {Performance of Data Mining, Media, and Financial Applications under Private Cloud Conditions}, author = {Dalvan Griebler and Adriano Vogel and Carlos A F Maron and Anderson M Maliszewski and Claudio Schepke and Luiz Gustavo Fernandes}, doi = {10.1109/ISCC.2018.8538759}, year = {2018}, date = {2018-06-01}, booktitle = {23rd IEEE Symposium on Computers and Communications (ISCC)}, publisher = {IEEE}, address = {Natal, Brazil}, abstract = {This paper contributes to a performance analysis of real-world workloads under private cloud conditions. We selected six benchmarks from PARSEC related to three mainstream application domains (financial, data mining, and media processing). Our goal was to evaluate these application domains in different cloud instances and deployment environments, concerning container or kernel-based instances and using dedicated or shared machine resources. Experiments have shown that performance varies according to the application characteristics, virtualization technology, and cloud environment. Results highlighted that financial, data mining, and media processing applications running in the LXC instances tend to outperform KVM when there is a dedicated machine resource environment. However, when two instances are sharing the same machine resources, these applications tend to achieve better performance in the KVM instances. Finally, financial applications achieved better performance in the cloud than media and data mining.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } This paper contributes to a performance analysis of real-world workloads under private cloud conditions. We selected six benchmarks from PARSEC related to three mainstream application domains (financial, data mining, and media processing). Our goal was to evaluate these application domains in different cloud instances and deployment environments, concerning container or kernel-based instances and using dedicated or shared machine resources. Experiments have shown that performance varies according to the application characteristics, virtualization technology, and cloud environment. Results highlighted that financial, data mining, and media processing applications running in the LXC instances tend to outperform KVM when there is a dedicated machine resource environment. However, when two instances are sharing the same machine resources, these applications tend to achieve better performance in the KVM instances. Finally, financial applications achieved better performance in the cloud than media and data mining. |
Rista, Cassiano; Teixeira, Marcelo; Griebler, Dalvan; Fernandes, Luiz Gustavo Evaluating, Estimating, and Improving Network Performance in Container-based Clouds Inproceedings doi 23rd IEEE Symposium on Computers and Communications (ISCC), IEEE, Natal, Brazil, 2018. @inproceedings{network_performance_container:ISCC:2018, title = {Evaluating, Estimating, and Improving Network Performance in Container-based Clouds}, author = {Cassiano Rista and Marcelo Teixeira and Dalvan Griebler and Luiz Gustavo Fernandes}, doi = {10.1109/ISCC.2018.8538558}, year = {2018}, date = {2018-04-16}, booktitle = {23rd IEEE Symposium on Computers and Communications (ISCC)}, publisher = {IEEE}, address = {Natal, Brazil}, abstract = {Cloud computing has recently attracted a great deal of interest from both industry and academia, emerging as an important paradigm to improve resource utilization, efficiency,flexibility, and pay-per-use. However, cloud platforms inherently include a virtualization layer that imposes performance degradation on network-intensive applications. Thus, it is crucial to anticipate possible performance degradation to resolve system bottlenecks. This paper uses the Petri Nets approach to create different models for evaluating, estimating, and improving network performance in container-based cloud environments. Based on model estimations, we assessed the network bandwidth utilization of the system under different setups. Then, by identifying possible bottlenecks, we show how the system could be modified to hopefully improve performance. We tested how the model would behave through real-world experiments. When the model indicates probable bandwidth saturation, we propose a link aggregation approach to increase bandwidth, using lightweight virtualization to reduce virtualization overhead. Results reveal that our model anticipates the structural and behavioral characteristics of the network in the cloud environment. Therefore, it systematically improves network efficiency, which saves effort, time, and money.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Cloud computing has recently attracted a great deal of interest from both industry and academia, emerging as an important paradigm to improve resource utilization, efficiency,flexibility, and pay-per-use. However, cloud platforms inherently include a virtualization layer that imposes performance degradation on network-intensive applications. Thus, it is crucial to anticipate possible performance degradation to resolve system bottlenecks. This paper uses the Petri Nets approach to create different models for evaluating, estimating, and improving network performance in container-based cloud environments. Based on model estimations, we assessed the network bandwidth utilization of the system under different setups. Then, by identifying possible bottlenecks, we show how the system could be modified to hopefully improve performance. We tested how the model would behave through real-world experiments. When the model indicates probable bandwidth saturation, we propose a link aggregation approach to increase bandwidth, using lightweight virtualization to reduce virtualization overhead. Results reveal that our model anticipates the structural and behavioral characteristics of the network in the cloud environment. Therefore, it systematically improves network efficiency, which saves effort, time, and money. |
Maliszewski, Anderson M; Griebler, Dalvan; Schepke, Claudio; Ditter, Alexander; Fey, Dietmar; Fernandes, Luiz Gustavo The NAS Benchmark Kernels for Single and Multi-Tenant Cloud Instances with LXC/KVM Inproceedings doi International Conference on High Performance Computing & Simulation (HPCS), IEEE, Orléans, France, 2018. @inproceedings{NAS_cloud_LXC_KVM:HPCS:2018, title = {The NAS Benchmark Kernels for Single and Multi-Tenant Cloud Instances with LXC/KVM}, author = {Anderson M Maliszewski and Dalvan Griebler and Claudio Schepke and Alexander Ditter and Dietmar Fey and Luiz Gustavo Fernandes}, doi = {10.1109/HPCS.2018.00066}, year = {2018}, date = {2018-07-01}, booktitle = {International Conference on High Performance Computing & Simulation (HPCS)}, publisher = {IEEE}, address = {Orléans, France}, abstract = {Private IaaS clouds are an attractive environment for scientific workloads and applications. It provides advantages such as almost instantaneous availability of high-performance computing in a single node as well as compute clusters, easy access for researchers, and users that do not have access to conventional supercomputers. Furthermore, a cloud infrastructure provides elasticity and scalability to ensure and manage any software dependency on the system with no third-party dependency for researchers. However, one of the biggest challenges is to avoid significant performance degradation when migrating these applications from physical nodes to a cloud environment. Also, we lack more research investigations for multi-tenant cloud instances. In this paper, our goal is to perform a comparative performance evaluation of scientific applications with single and multi-tenancy cloud instances using KVM and LXC virtualization technologies under private cloud conditions. All analyses and evaluations were carried out based on NAS Benchmark kernels to simulate different types of workloads. We applied statistic significance tests to highlight the differences. The results have shown that applications running on LXC-based cloud instances outperform KVM-based cloud instances in 93.75% of the experiments w.r.t single tenant. Regarding multi-tenant, LXC instances outperform KVM instances in 45% of the results, where the performance differences were not as significant as expected.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Private IaaS clouds are an attractive environment for scientific workloads and applications. It provides advantages such as almost instantaneous availability of high-performance computing in a single node as well as compute clusters, easy access for researchers, and users that do not have access to conventional supercomputers. Furthermore, a cloud infrastructure provides elasticity and scalability to ensure and manage any software dependency on the system with no third-party dependency for researchers. However, one of the biggest challenges is to avoid significant performance degradation when migrating these applications from physical nodes to a cloud environment. Also, we lack more research investigations for multi-tenant cloud instances. In this paper, our goal is to perform a comparative performance evaluation of scientific applications with single and multi-tenancy cloud instances using KVM and LXC virtualization technologies under private cloud conditions. All analyses and evaluations were carried out based on NAS Benchmark kernels to simulate different types of workloads. We applied statistic significance tests to highlight the differences. The results have shown that applications running on LXC-based cloud instances outperform KVM-based cloud instances in 93.75% of the experiments w.r.t single tenant. Regarding multi-tenant, LXC instances outperform KVM instances in 45% of the results, where the performance differences were not as significant as expected. |
Maron, Carlos A F Parametrização do paralelismo de stream em Benchmarks da suíte PARSEC Masters Thesis School of Technology - PPGCC - PUCRS, 2018. @mastersthesis{MARON:DM:18, title = {Parametrização do paralelismo de stream em Benchmarks da suíte PARSEC}, author = {Carlos A. F. Maron}, url = {http://tede2.pucrs.br/tede2/handle/tede/8556}, year = {2018}, date = {2018-08-01}, address = {Porto Alegre, Brazil}, school = {School of Technology - PPGCC - PUCRS}, abstract = {The parallel software designer aims to deliver efficient and scalable applications. This can be done by understanding the performance impacts of the application’s characteristics. Parallel applications of the same domain use to present similar patterns of behavior and characteristics. One way to go for understanding and evaluating the applications’ characteristics is using parametrizable benchmarks, which enables users to play with the important characteristics when running the benchmark. However, the parametrization technique must be better exploited in the available benchmarks, especially on stream processing application domain. Our challenge is to enable the parametrization of the stream processing applications’ characteristics (also known as stream parallelism) through benchmarks. Mainly because this application domain is widely used and the benchmarks available for it usually do not support the evaluation of important characteristics from this domain (e.g., PARSEC). Therefore, the goal is to identify the stream parallelism characteristics present in the PARSEC benchmarks and implement the parametrization support for ready to use. We selected the Dedup and Ferret applications, which represent the stream parallelism domain. In the experimental results, we observed that our implemented parametrization has caused performance impacts in this application domain. In the most cases, our parametrization improved the throughput, latency, service time, and execution time. Moreover, since we have not evaluated the computer architectures and parallel programming frameworks’ performance, the results have shown new potential research investigations to understand other patterns of behavior caused by the parametrization.}, keywords = {}, pubstate = {published}, tppubtype = {mastersthesis} } The parallel software designer aims to deliver efficient and scalable applications. This can be done by understanding the performance impacts of the application’s characteristics. Parallel applications of the same domain use to present similar patterns of behavior and characteristics. One way to go for understanding and evaluating the applications’ characteristics is using parametrizable benchmarks, which enables users to play with the important characteristics when running the benchmark. However, the parametrization technique must be better exploited in the available benchmarks, especially on stream processing application domain. Our challenge is to enable the parametrization of the stream processing applications’ characteristics (also known as stream parallelism) through benchmarks. Mainly because this application domain is widely used and the benchmarks available for it usually do not support the evaluation of important characteristics from this domain (e.g., PARSEC). Therefore, the goal is to identify the stream parallelism characteristics present in the PARSEC benchmarks and implement the parametrization support for ready to use. We selected the Dedup and Ferret applications, which represent the stream parallelism domain. In the experimental results, we observed that our implemented parametrization has caused performance impacts in this application domain. In the most cases, our parametrization improved the throughput, latency, service time, and execution time. Moreover, since we have not evaluated the computer architectures and parallel programming frameworks’ performance, the results have shown new potential research investigations to understand other patterns of behavior caused by the parametrization. |
Vogel, Adriano Adaptive Degree of Parallelism for the SPar Runtime Masters Thesis School of Technology - PPGCC - PUCRS, 2018. @mastersthesis{VOGEL:DM:18, title = {Adaptive Degree of Parallelism for the SPar Runtime}, author = {Adriano Vogel}, url = {http://tede2.pucrs.br/tede2/handle/tede/8255}, year = {2018}, date = {2018-03-01}, address = {Porto Alegre, Brazil}, school = {School of Technology - PPGCC - PUCRS}, abstract = {In recent years, stream processing applications have become a traditional workload in computing systems. They are traditionally found in video, audio, graphic and image processing. Many of these applications demand parallelism to increase performance. However, programmers must often face the trade-off between coding productivity and performance that introducing parallelism creates. SPar Domain-Specific Language (DSL) was created to achieve the optimal balance for programmers, with the C++-11 attribute annotation mechanism to ensure that essential properties of stream parallelism could be represented (stage, input, output, and replicate). The compiler recognizes the SPar attributes and generates parallel code automatically. The need to manually define parallelism is tne crucial challenge for increasing SPAR's abstraction level, because it is time consuming and error prone. Also, executing several applications can fail to be efficient when running a non-suitable number of replicas. This occurs when the defined number of replicas in a parallel region is not optimal or when a static number is used, which ignores the dynamic nature of stream processing applications. In order to solve this problem, we introduced the concept of the abstracted and adaptive number of replicas for SPar. Moreover, we described our implemented mechanism as well as transformation rules that enable SPar to generate parallel code with the adaptive degree of parallelism support. We experimentally evaluated the implemented adaptive mechanisms regarding their effectiveness. Thus, we used real-world applications to demonstrate that our adaptive mechanism implementations can provide higher abstraction levels without significant performance degradation.}, keywords = {}, pubstate = {published}, tppubtype = {mastersthesis} } In recent years, stream processing applications have become a traditional workload in computing systems. They are traditionally found in video, audio, graphic and image processing. Many of these applications demand parallelism to increase performance. However, programmers must often face the trade-off between coding productivity and performance that introducing parallelism creates. SPar Domain-Specific Language (DSL) was created to achieve the optimal balance for programmers, with the C++-11 attribute annotation mechanism to ensure that essential properties of stream parallelism could be represented (stage, input, output, and replicate). The compiler recognizes the SPar attributes and generates parallel code automatically. The need to manually define parallelism is tne crucial challenge for increasing SPAR's abstraction level, because it is time consuming and error prone. Also, executing several applications can fail to be efficient when running a non-suitable number of replicas. This occurs when the defined number of replicas in a parallel region is not optimal or when a static number is used, which ignores the dynamic nature of stream processing applications. In order to solve this problem, we introduced the concept of the abstracted and adaptive number of replicas for SPar. Moreover, we described our implemented mechanism as well as transformation rules that enable SPar to generate parallel code with the adaptive degree of parallelism support. We experimentally evaluated the implemented adaptive mechanisms regarding their effectiveness. Thus, we used real-world applications to demonstrate that our adaptive mechanism implementations can provide higher abstraction levels without significant performance degradation. |
2017 |
Griebler, Dalvan; Danelutto, Marco; Torquati, Massimo; Fernandes, Luiz Gustavo SPar: A DSL for High-Level and Productive Stream Parallelism Journal Article doi Parallel Processing Letters, 27 (01), pp. 1740005, 2017. @article{GRIEBLER:PPL:17, title = {SPar: A DSL for High-Level and Productive Stream Parallelism}, author = {Dalvan Griebler and Marco Danelutto and Massimo Torquati and Luiz Gustavo Fernandes}, url = {http://dx.doi.org/10.1142/S0129626417400059}, doi = {10.1142/S0129626417400059}, year = {2017}, date = {2017-03-01}, journal = {Parallel Processing Letters}, volume = {27}, number = {01}, pages = {1740005}, publisher = {World Scientific}, abstract = {This paper introduces SPar, an internal C++ Domain-Specific Language (DSL) that supports the development of classic stream parallel applications. The DSL uses standard C++ attributes to introduce annotations tagging the notable components of stream parallel applications: stream sources and stream processing stages. A set of tools process SPar code (C++ annotated code using the SPar attributes) to generate FastFlow C++ code that exploits the stream parallelism denoted by SPar annotations while targeting shared memory multi-core architectures. We outline the main SPar features along with the main implementation techniques and tools. Also, we show the results of experiments assessing the feasibility of the entire approach as well as SPar’s performance and expressiveness.}, keywords = {}, pubstate = {published}, tppubtype = {article} } This paper introduces SPar, an internal C++ Domain-Specific Language (DSL) that supports the development of classic stream parallel applications. The DSL uses standard C++ attributes to introduce annotations tagging the notable components of stream parallel applications: stream sources and stream processing stages. A set of tools process SPar code (C++ annotated code using the SPar attributes) to generate FastFlow C++ code that exploits the stream parallelism denoted by SPar annotations while targeting shared memory multi-core architectures. We outline the main SPar features along with the main implementation techniques and tools. Also, we show the results of experiments assessing the feasibility of the entire approach as well as SPar’s performance and expressiveness. |
Griebler, Dalvan; Hoffmann, Renato B; Loff, Junior; Danelutto, Marco; Fernandes, Luiz G High-Level and Efficient Stream Parallelism on Multi-core Systems with SPar for Data Compression Applications Inproceedings XVIII Simpósio em Sistemas Computacionais de Alto Desempenho, pp. 16-27, SBC, Campinas, SP, Brasil, 2017. @inproceedings{GRIEBLER:WSCAD:17, title = {High-Level and Efficient Stream Parallelism on Multi-core Systems with SPar for Data Compression Applications}, author = {Dalvan Griebler and Renato B. Hoffmann and Junior Loff and Marco Danelutto and Luiz G. Fernandes}, url = {https://gmap.pucrs.br/dalvan/papers/2017/CR_WSCAD_2017.pdf}, year = {2017}, date = {2017-10-01}, booktitle = {XVIII Simpósio em Sistemas Computacionais de Alto Desempenho}, pages = {16-27}, publisher = {SBC}, address = {Campinas, SP, Brasil}, abstract = {The stream processing domain is present in several real-world applications that are running on multi-core systems. In this paper, we focus on data compression applications that are an important sub-set of this domain. Our main goal is to assess the programmability and efficiency of domain-specific language called SPar. It was specially designed for expressing stream parallelism and it promises higher-level parallelism abstractions without significant performance losses. Therefore, we parallelized Lzip and Bzip2 compressors with SPar and compared with state-of-the-art frameworks. The results revealed that SPar is able to efficiently exploit stream parallelism as well as provide suitable abstractions with less code intrusion and code re-factoring.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The stream processing domain is present in several real-world applications that are running on multi-core systems. In this paper, we focus on data compression applications that are an important sub-set of this domain. Our main goal is to assess the programmability and efficiency of domain-specific language called SPar. It was specially designed for expressing stream parallelism and it promises higher-level parallelism abstractions without significant performance losses. Therefore, we parallelized Lzip and Bzip2 compressors with SPar and compared with state-of-the-art frameworks. The results revealed that SPar is able to efficiently exploit stream parallelism as well as provide suitable abstractions with less code intrusion and code re-factoring. |
Griebler, Dalvan; Hoffmann, Renato B; Danelutto, Marco; Fernandes, Luiz Gustavo Higher-Level Parallelism Abstractions for Video Applications with SPar Inproceedings doi Parallel Computing is Everywhere, Proceedings of the International Conference on Parallel Computing, pp. 698-707, IOS Press, Bologna, Italy, 2017. @inproceedings{GRIEBLER:REPARA:17, title = {Higher-Level Parallelism Abstractions for Video Applications with SPar}, author = {Dalvan Griebler and Renato B. Hoffmann and Marco Danelutto and Luiz Gustavo Fernandes}, url = {https://doi.org/10.3233/978-1-61499-843-3-698}, doi = {10.3233/978-1-61499-843-3-698}, year = {2017}, date = {2017-09-01}, booktitle = {Parallel Computing is Everywhere, Proceedings of the International Conference on Parallel Computing}, pages = {698-707}, publisher = {IOS Press}, address = {Bologna, Italy}, series = {ParCo'17}, abstract = {SPar is a Domain-Specific Language (DSL) designed to provide high-level parallel programming abstractions for streaming applications. Video processing application domain requires parallel processing to extract and analyze information quickly. When using state-of-the-art frameworks such as FastFlow and TBB, the application programmer has to manage source code re-factoring and performance optimization to implement parallelism efficiently. Our goal is to make this process easier for programmers through SPar. Thus we assess SPar's programming language and its performance in traditional video applications. We also discuss different implementations compared to the ones of SPar. Results demonstrate that SPar maintains the sequential code structure, is less code intrusive, and provides higher-level programming abstractions without introducing notable performance losses. Therefore, it represents a good choice for application programmers from the video processing domain.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } SPar is a Domain-Specific Language (DSL) designed to provide high-level parallel programming abstractions for streaming applications. Video processing application domain requires parallel processing to extract and analyze information quickly. When using state-of-the-art frameworks such as FastFlow and TBB, the application programmer has to manage source code re-factoring and performance optimization to implement parallelism efficiently. Our goal is to make this process easier for programmers through SPar. Thus we assess SPar's programming language and its performance in traditional video applications. We also discuss different implementations compared to the ones of SPar. Results demonstrate that SPar maintains the sequential code structure, is less code intrusive, and provides higher-level programming abstractions without introducing notable performance losses. Therefore, it represents a good choice for application programmers from the video processing domain. |
Ledur, Cleverson; Griebler, Dalvan; Manssour, Isabel; Fernandes, Luiz Gustavo A High-Level DSL for Geospatial Visualizations with Multi-core Parallelism Support Inproceedings doi 41th IEEE Computer Society Signature Conference on Computers, Software and Applications, pp. 298-304, IEEE, Torino, Italy, 2017. @inproceedings{LEDUR:COMPSAC:17, title = {A High-Level DSL for Geospatial Visualizations with Multi-core Parallelism Support}, author = {Cleverson Ledur and Dalvan Griebler and Isabel Manssour and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1109/COMPSAC.2017.18}, doi = {10.1109/COMPSAC.2017.18}, year = {2017}, date = {2017-07-01}, booktitle = {41th IEEE Computer Society Signature Conference on Computers, Software and Applications}, pages = {298-304}, publisher = {IEEE}, address = {Torino, Italy}, series = {COMPSAC'17}, abstract = {The amount of data generated worldwide associated with geolocalization has exponentially increased over the last decade due to social networks, population demographics, and the popularization of Global Positioning Systems. Several methods for geovisualization have already been developed, but many of them are focused on a specific application or require learning a variety of tools and programming languages. It becomes even more difficult when users have to manage a large amount of data because state-of-the-art alternatives require the use of third-party pre-processing tools. We present a novel Domain-Specific Language (DSL), which focuses on large data geovisualizations. Through a compiler, we support automatic visualization generations and data pre-processing. The system takes advantage of multi-core parallelism to speed-up data pre-processing abstractly. Our experiments were designated to highlight the programming effort and performance of our DSL. The results have shown a considerable programming effort reduction and efficient parallelism support with respect to the sequential version.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The amount of data generated worldwide associated with geolocalization has exponentially increased over the last decade due to social networks, population demographics, and the popularization of Global Positioning Systems. Several methods for geovisualization have already been developed, but many of them are focused on a specific application or require learning a variety of tools and programming languages. It becomes even more difficult when users have to manage a large amount of data because state-of-the-art alternatives require the use of third-party pre-processing tools. We present a novel Domain-Specific Language (DSL), which focuses on large data geovisualizations. Through a compiler, we support automatic visualization generations and data pre-processing. The system takes advantage of multi-core parallelism to speed-up data pre-processing abstractly. Our experiments were designated to highlight the programming effort and performance of our DSL. The results have shown a considerable programming effort reduction and efficient parallelism support with respect to the sequential version. |
Griebler, Dalvan; Fernandes, Luiz Gustavo Towards Distributed Parallel Programming Support for the SPar DSL Inproceedings doi Parallel Computing is Everywhere, Proceedings of the International Conference on Parallel Computing, pp. 563-572, IOS Press, Bologna, Italy, 2017. @inproceedings{GRIEBLER:PARCO:17, title = {Towards Distributed Parallel Programming Support for the SPar DSL}, author = {Dalvan Griebler and Luiz Gustavo Fernandes}, url = {https://doi.org/10.3233/978-1-61499-843-3-563}, doi = {10.3233/978-1-61499-843-3-563}, year = {2017}, date = {2017-09-01}, booktitle = {Parallel Computing is Everywhere, Proceedings of the International Conference on Parallel Computing}, pages = {563-572}, publisher = {IOS Press}, address = {Bologna, Italy}, series = {ParCo'17}, abstract = {SPar was originally designed to provide high-level abstractions for stream parallelism in C++ programs targeting multi-core systems. This work proposes distributed parallel programming support for SPar targeting cluster environments. The goal is to preserve the original semantics while source-to-source code transformations will be turned into MPI (Message Passing Interface) parallel code. The results of the experiments presented in the paper demonstrate improved programmability without significant performance losses.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } SPar was originally designed to provide high-level abstractions for stream parallelism in C++ programs targeting multi-core systems. This work proposes distributed parallel programming support for SPar targeting cluster environments. The goal is to preserve the original semantics while source-to-source code transformations will be turned into MPI (Message Passing Interface) parallel code. The results of the experiments presented in the paper demonstrate improved programmability without significant performance losses. |
Araujo, Gabriell; Ledur, Cleverson; Griebler, Dalvan; Fernandes, Luiz G Exploração do Paralelismo em Algoritmos de Mineração de Dados com Pthreads, OpenMP, FastFlow, TBB e Phoenix++ Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 4, Sociedade Brasileira de Computação (SBC), Ijuí, RS, BR, 2017. @inproceedings{ARAUJO:ERAD:17, title = {Exploração do Paralelismo em Algoritmos de Mineração de Dados com Pthreads, OpenMP, FastFlow, TBB e Phoenix++}, author = {Gabriell Araujo and Cleverson Ledur and Dalvan Griebler and Luiz G. Fernandes}, url = {https://gmap.pucrs.br/dalvan/papers/2017/CR_ERAD_IC_Araujo_2017.pdf}, year = {2017}, date = {2017-04-01}, booktitle = {Escola Regional de Alto Desempenho (ERAD/RS)}, pages = {4}, publisher = {Sociedade Brasileira de Computação (SBC)}, address = {Ijuí, RS, BR}, abstract = {Com o objetivo de introduzir algoritmos de mineração de dados paralelos na DSL GMaVis, foram paralelizadas quatro aplicações com cinco interfaces de programação paralela. Este trabalho apresenta a comparação destas interfaces, a fim de avaliar qual oferece maior desempenho e produtividade de código. Os resultados demonstram que é possível atingir menor número de linhas de código e bom desempenho com OpenMP e FastFlow.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Com o objetivo de introduzir algoritmos de mineração de dados paralelos na DSL GMaVis, foram paralelizadas quatro aplicações com cinco interfaces de programação paralela. Este trabalho apresenta a comparação destas interfaces, a fim de avaliar qual oferece maior desempenho e produtividade de código. Os resultados demonstram que é possível atingir menor número de linhas de código e bom desempenho com OpenMP e FastFlow. |
de Mesquita, Cassiano E; Ledur, Cleverson; Griebler, Dalvan; Fernandes, Luiz G Proposta de uma Plataforma para Experimentos de Software em Programação Paralela Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 4, Sociedade Brasileira de Computação (SBC), Ijuí, RS, BR, 2017. @inproceedings{MESQUITA:ERAD:17, title = {Proposta de uma Plataforma para Experimentos de Software em Programação Paralela}, author = {Cassiano E. de Mesquita and Cleverson Ledur and Dalvan Griebler and Luiz G. Fernandes}, url = {https://gmap.pucrs.br/dalvan/papers/2017/CR_ERAD_PG_Mesquita_2017.pdf}, year = {2017}, date = {2017-04-01}, booktitle = {Escola Regional de Alto Desempenho (ERAD/RS)}, pages = {4}, publisher = {Sociedade Brasileira de Computação (SBC)}, address = {Ijuí, RS, BR}, abstract = {Este artigo propõe uma plataforma web para simplificar a avaliação de interfaces de programação paralela. A ideia central é identificar as dificuldades enfrentadas por potenciais desenvolvedores a fim de propor melhorias que irão reduzir o esforço na paralelização de aplicações. A plataforma prevista é composta de uma interface web, implementada com linguagens PHP e Javascript.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Este artigo propõe uma plataforma web para simplificar a avaliação de interfaces de programação paralela. A ideia central é identificar as dificuldades enfrentadas por potenciais desenvolvedores a fim de propor melhorias que irão reduzir o esforço na paralelização de aplicações. A plataforma prevista é composta de uma interface web, implementada com linguagens PHP e Javascript. |
Vogel, Adriano; Griebler, Dalvan; Fernandes, Luiz Gustavo Proposta de Implementação de Grau de Paralelismo Adaptativo em uma DSL para Paralelismo de Stream Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 4, Sociedade Brasileira de Computação (SBC), Ijuí, RS, BR, 2017. @inproceedings{VOGEL:ERAD:17, title = {Proposta de Implementação de Grau de Paralelismo Adaptativo em uma DSL para Paralelismo de Stream}, author = {Adriano Vogel and Dalvan Griebler and Luiz Gustavo Fernandes}, url = {https://gmap.pucrs.br/dalvan/papers/2017/CR_ERAD_PG_Vogel_2017.pdf}, year = {2017}, date = {2017-04-01}, booktitle = {Escola Regional de Alto Desempenho (ERAD/RS)}, pages = {4}, publisher = {Sociedade Brasileira de Computação (SBC)}, address = {Ijuí, RS, BR}, abstract = {A classe de aplicações de stream possuem características únicas, como variação nas entradas/saídas e execuções por períodos indefinidos. Este paradigma é utilizado com intuito de diminuir os tempos de execução e aumentar a vazão das aplicações. Nesse estudo é proposto o suporte adaptativo do grau de paralelismo de stream na DSL (Domain-Specific Language) SPar.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } A classe de aplicações de stream possuem características únicas, como variação nas entradas/saídas e execuções por períodos indefinidos. Este paradigma é utilizado com intuito de diminuir os tempos de execução e aumentar a vazão das aplicações. Nesse estudo é proposto o suporte adaptativo do grau de paralelismo de stream na DSL (Domain-Specific Language) SPar. |
Löff, Júnior; Griebler, Dalvan; Ledur, Cleverson; Fernandes, Luiz G Explorando a Flexibilidade e o Desempenho da Biblioteca FastFlow com o Padrão Paralelo Farm Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 4, Sociedade Brasileira de Computação (SBC), Ijuí, RS, BR, 2017. @inproceedings{LOFF:ERAD:17, title = {Explorando a Flexibilidade e o Desempenho da Biblioteca FastFlow com o Padrão Paralelo Farm}, author = {Júnior Löff and Dalvan Griebler and Cleverson Ledur and Luiz G. Fernandes}, url = {https://gmap.pucrs.br/dalvan/papers/2017/CR_ERAD_IC_Loff_2017.pdf}, year = {2017}, date = {2017-04-01}, booktitle = {Escola Regional de Alto Desempenho (ERAD/RS)}, pages = {4}, publisher = {Sociedade Brasileira de Computação (SBC)}, address = {Ijuí, RS, BR}, abstract = {O paralelismo é uma tarefa para especialistas, onde o desafio é utilizar abstrações que ofereçam a flexibilidade e expressividade necessária para atingir o melhor desempenho. Este artigo visa explorar variações na implementação do padrão Farm utilizando a biblioteca FastFlow nos algoritmos K-means (domínio da mineração de dados) e Mandelbrot Set (domínio da matemática). Concluímos que o padrão Farm oferece boa flexibilidade e bom desempenho.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } O paralelismo é uma tarefa para especialistas, onde o desafio é utilizar abstrações que ofereçam a flexibilidade e expressividade necessária para atingir o melhor desempenho. Este artigo visa explorar variações na implementação do padrão Farm utilizando a biblioteca FastFlow nos algoritmos K-means (domínio da mineração de dados) e Mandelbrot Set (domínio da matemática). Concluímos que o padrão Farm oferece boa flexibilidade e bom desempenho. |
Hoffmann, Renato B; Griebler, Dalvan; Ledur, Cleverson; Fernandes, Luiz G Avaliando a Produtividade e o Desempenho da DSL SPar em uma Aplicação de Detecção de Pistas Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 4, Sociedade Brasileira de Computação (SBC), Ijuí, RS, BR, 2017. @inproceedings{FILHO:ERAD:17, title = {Avaliando a Produtividade e o Desempenho da DSL SPar em uma Aplicação de Detecção de Pistas}, author = {Renato B. Hoffmann and Dalvan Griebler and Cleverson Ledur and Luiz G. Fernandes}, url = {https://gmap.pucrs.br/dalvan/papers/2017/CR_ERAD_IC_Hoffmann.pdf}, year = {2017}, date = {2017-04-01}, booktitle = {Escola Regional de Alto Desempenho (ERAD/RS)}, pages = {4}, publisher = {Sociedade Brasileira de Computação (SBC)}, address = {Ijuí, RS, BR}, abstract = {A linguagem de domínio específico SPar, embarcada na linguagem C++, fornece através de anotações uma alternativa para explorar o paralelismo de stream em arquiteturas multi-núcleo. Neste artigo, o objetivo é demonstrar indicadores de desempenho e produtividade em uma aplicação de detecção de pistas. Os resultados comprovaram que a SPar apresentou maior produtividade e bom desempenho.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } A linguagem de domínio específico SPar, embarcada na linguagem C++, fornece através de anotações uma alternativa para explorar o paralelismo de stream em arquiteturas multi-núcleo. Neste artigo, o objetivo é demonstrar indicadores de desempenho e produtividade em uma aplicação de detecção de pistas. Os resultados comprovaram que a SPar apresentou maior produtividade e bom desempenho. |
Vogel, Adriano; Griebler, Dalvan; Schepke, Claudio; Fernandes, Luiz Gustavo An Intra-Cloud Networking Performance Evaluation on CloudStack Environment Inproceedings doi 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 5, IEEE, St. Petersburg, Russia, 2017. @inproceedings{larcc:intra-cloud_networking_cloudstack:PDP:17, title = {An Intra-Cloud Networking Performance Evaluation on CloudStack Environment}, author = {Adriano Vogel and Dalvan Griebler and Claudio Schepke and Luiz Gustavo Fernandes}, doi = {10.1109/PDP.2017.40}, year = {2017}, date = {2017-03-01}, booktitle = {25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)}, pages = {5}, publisher = {IEEE}, address = {St. Petersburg, Russia}, abstract = {Infrastructure-as-a-Service (IaaS) is a cloud on-demand commodity built on top of virtualization technologies and managed by IaaS tools. In this scenario, performance is a relevant matter because a set of aspects may impact and increase the system overhead.Specific on the network, the use of virtualized capabilities may cause performance degradation (eg.,latency, throughput). The goal of this paper is to contribute to networking performance evaluation, providing new insights for private IaaS clouds. To achieve our goal, we deploy CloudStack environments and conduct experiments with different configurations and techniques. The research findings demonstrate that KVM-based cloud instances have small network performance degradation regarding throughput (about 0.2% for coarse-grained and 6.8% for fine-grained messages) while container-based instances have even better results. On the other hand, the KVM instances present worst latency (about 12.4% on coarse-grained and two times more on fine-grained messages w.r.t. native environment) and better in container-based instances, where the performance results are close to the native environment. Furthermore, we demonstrate a performance optimization of applications running on KVM.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Infrastructure-as-a-Service (IaaS) is a cloud on-demand commodity built on top of virtualization technologies and managed by IaaS tools. In this scenario, performance is a relevant matter because a set of aspects may impact and increase the system overhead.Specific on the network, the use of virtualized capabilities may cause performance degradation (eg.,latency, throughput). The goal of this paper is to contribute to networking performance evaluation, providing new insights for private IaaS clouds. To achieve our goal, we deploy CloudStack environments and conduct experiments with different configurations and techniques. The research findings demonstrate that KVM-based cloud instances have small network performance degradation regarding throughput (about 0.2% for coarse-grained and 6.8% for fine-grained messages) while container-based instances have even better results. On the other hand, the KVM instances present worst latency (about 12.4% on coarse-grained and two times more on fine-grained messages w.r.t. native environment) and better in container-based instances, where the performance results are close to the native environment. Furthermore, we demonstrate a performance optimization of applications running on KVM. |
Rista, Cassiano; Griebler, Dalvan; Maron, Carlos A F; Fernandes, Luiz Gustavo Improving the Network Performance of a Container-Based Cloud Environment for Hadoop Systems Inproceedings doi International Conference on High Performance Computing & Simulation (HPCS), pp. 619-626, IEEE, Genoa, Italy, 2017. @inproceedings{larcc:link_aggregation:HPCS:2017, title = {Improving the Network Performance of a Container-Based Cloud Environment for Hadoop Systems}, author = {Cassiano Rista and Dalvan Griebler and Carlos A F Maron and Luiz Gustavo Fernandes}, url = {http://ieeexplore.ieee.org/document/8035136/}, doi = {10.1109/HPCS.2017.97}, year = {2017}, date = {2017-07-01}, booktitle = {International Conference on High Performance Computing & Simulation (HPCS)}, pages = {619-626}, publisher = {IEEE}, address = {Genoa, Italy}, series = {HPCS'17}, abstract = {Cloud computing has emerged as an important paradigm to improve resource utilization, efficiency, flexibility, and the pay-per-use billing structure. However, cloud platforms cause performance degradations due to their virtualization layer and may not be appropriate for the requirements of high-performance applications, such as big data. This paper tackles the problem of improving network performance in container-based cloud instances to create a viable alternative to run network intensive Hadoop applications. Our approach consists of deploying link aggregation via the IEEE 802.3ad standard to increase the available bandwidth and using LXC (Linux Container) cloud instances to create a Hadoop cluster. In order to evaluate the efficiency of our approach and the overhead added by the container-based cloud environment, we ran a set of experiments to measure throughput, latency, bandwidth utilization, and completion times. The results prove that our approach adds minimal overhead in cloud environment as well as increases throughput and reduces latency. Moreover, our approach demonstrates a suitable alternative for running Hadoop applications, reducing completion times up to 33.73%}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Cloud computing has emerged as an important paradigm to improve resource utilization, efficiency, flexibility, and the pay-per-use billing structure. However, cloud platforms cause performance degradations due to their virtualization layer and may not be appropriate for the requirements of high-performance applications, such as big data. This paper tackles the problem of improving network performance in container-based cloud instances to create a viable alternative to run network intensive Hadoop applications. Our approach consists of deploying link aggregation via the IEEE 802.3ad standard to increase the available bandwidth and using LXC (Linux Container) cloud instances to create a Hadoop cluster. In order to evaluate the efficiency of our approach and the overhead added by the container-based cloud environment, we ran a set of experiments to measure throughput, latency, bandwidth utilization, and completion times. The results prove that our approach adds minimal overhead in cloud environment as well as increases throughput and reduces latency. Moreover, our approach demonstrates a suitable alternative for running Hadoop applications, reducing completion times up to 33.73% |
2016 |
Teodoro, Silvana; do Carmo, Andriele Busatto; Adornes, Daniel Couto; Fernandes, Luiz Gustavo A comparative study of energy-aware scheduling algorithms for computational grids Journal Article doi Journal of Systems and Software, 117 , pp. 153-165, 2016. @article{gmap:TEODORO:JSS:16, title = {A comparative study of energy-aware scheduling algorithms for computational grids}, author = {Silvana Teodoro and Andriele Busatto do Carmo and Daniel Couto Adornes and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1016/j.jss.2016.02.017}, doi = {10.1016/j.jss.2016.02.017}, year = {2016}, date = {2016-07-01}, journal = {Journal of Systems and Software}, volume = {117}, pages = {153-165}, publisher = {Elsevier}, abstract = {Recent advances in High Performance Computing (HPC) have required the attention of scientific community regarding aspects that do not concern only performance. In order to enhance computational capacity, modern parallel and distributed architectures are designed with more processing units, causing an increase in energy consumption. Currently, one of the most representative HPC platforms are computational grids, which are used in many scientific and academic projects. In this work, we propose four energy-aware scheduling algorithms to efficiently manage the energy consumption in computational grids, trying to mitigate performance loss. Our algorithms propose an efficient management of idle resources and a clever use of active ones. We have evaluated our algorithms using the SimGrid framework and an energy consumption estimation method we proposed for Bag-of-Tasks-type (BoT) applications. We compared our algorithms against five others developed to work with computational grids. In a set of experimental scenarios, our results show that by using our algorithms it is possible to achieve up to 75.90% of reduction in the energy consumption combined with 5.28% of performance loss compared with the best algorithm in performance.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Recent advances in High Performance Computing (HPC) have required the attention of scientific community regarding aspects that do not concern only performance. In order to enhance computational capacity, modern parallel and distributed architectures are designed with more processing units, causing an increase in energy consumption. Currently, one of the most representative HPC platforms are computational grids, which are used in many scientific and academic projects. In this work, we propose four energy-aware scheduling algorithms to efficiently manage the energy consumption in computational grids, trying to mitigate performance loss. Our algorithms propose an efficient management of idle resources and a clever use of active ones. We have evaluated our algorithms using the SimGrid framework and an energy consumption estimation method we proposed for Bag-of-Tasks-type (BoT) applications. We compared our algorithms against five others developed to work with computational grids. In a set of experimental scenarios, our results show that by using our algorithms it is possible to achieve up to 75.90% of reduction in the energy consumption combined with 5.28% of performance loss compared with the best algorithm in performance. |
Vogel, Adriano; Griebler, Dalvan; Maron, Carlos A F; Schepke, Claudio; Fernandes, Luiz Gustavo Private IaaS Clouds: A Comparative Analysis of OpenNebula, CloudStack and OpenStack Inproceedings doi 24th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 672-679, IEEE, Heraklion Crete, Greece, 2016. @inproceedings{larcc:IaaS_private:PDP:16, title = {Private IaaS Clouds: A Comparative Analysis of OpenNebula, CloudStack and OpenStack}, author = {Adriano Vogel and Dalvan Griebler and Carlos A F Maron and Claudio Schepke and Luiz Gustavo Fernandes}, url = {http://ieeexplore.ieee.org/document/7445407/}, doi = {10.1109/PDP.2016.75}, year = {2016}, date = {2016-02-01}, booktitle = {24th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)}, pages = {672-679}, publisher = {IEEE}, address = {Heraklion Crete, Greece}, series = {PDP'16}, abstract = {Despite the evolution of cloud computing in recent years, the performance and comprehensive understanding of the available private cloud tools are still under research. This paper contributes to an analysis of the Infrastructure as a Service (IaaS) domain by mapping new insights and discussing the challenges for improving cloud services. The goal is to make a comparative analysis of OpenNebula, OpenStack and CloudStack tools, evaluating their differences on support for flexibility and resiliency. Also, we aim at evaluating these three cloud tools when they are deployed using a mutual hypervisor (KVM) for discovering new empirical insights. Our research results demonstrated that OpenStack is the most resilient and CloudStack is the most flexible for deploying an IaaS private cloud. Moreover, the performance experiments indicated some contrasts among the private IaaS cloud instances when running intensive workloads and scientific applications.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Despite the evolution of cloud computing in recent years, the performance and comprehensive understanding of the available private cloud tools are still under research. This paper contributes to an analysis of the Infrastructure as a Service (IaaS) domain by mapping new insights and discussing the challenges for improving cloud services. The goal is to make a comparative analysis of OpenNebula, OpenStack and CloudStack tools, evaluating their differences on support for flexibility and resiliency. Also, we aim at evaluating these three cloud tools when they are deployed using a mutual hypervisor (KVM) for discovering new empirical insights. Our research results demonstrated that OpenStack is the most resilient and CloudStack is the most flexible for deploying an IaaS private cloud. Moreover, the performance experiments indicated some contrasts among the private IaaS cloud instances when running intensive workloads and scientific applications. |
Bairros, Gildomiro; Griebler, Dalvan; Fernandes, Luiz Gustavo Proposta de Suporte a Elasticidade Automática em Nuvem para uma Linguagem Específica de Domínio Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 197-198, Sociedade Brasileira de Computação (SBC), São Leopoldo, RS, BR, 2016. @inproceedings{BAIRROS:ERAD:16, title = {Proposta de Suporte a Elasticidade Automática em Nuvem para uma Linguagem Específica de Domínio}, author = {Gildomiro Bairros and Dalvan Griebler and Luiz Gustavo Fernandes}, url = {https://gmap.pucrs.br/dalvan/papers/2016/CR_ERAD_PG__2016.pdf}, year = {2016}, date = {2016-04-01}, booktitle = {Escola Regional de Alto Desempenho (ERAD/RS)}, pages = {197-198}, publisher = {Sociedade Brasileira de Computação (SBC)}, address = {São Leopoldo, RS, BR}, abstract = {Este artigo apresenta uma proposta de desenvolvimento de um middleware para prover elasticidade para aplicações desenvolvidas com uma linguagem específica de domínio voltada para o paralelismo de stream. O middleware atuará a nível de PaaS e colocará instruções de elasticidade de forma transparente ao desenvolvedor, fazendo o parser do código e injetando automaticamente as instruções de elasticidade.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Este artigo apresenta uma proposta de desenvolvimento de um middleware para prover elasticidade para aplicações desenvolvidas com uma linguagem específica de domínio voltada para o paralelismo de stream. O middleware atuará a nível de PaaS e colocará instruções de elasticidade de forma transparente ao desenvolvedor, fazendo o parser do código e injetando automaticamente as instruções de elasticidade. |
Maron, Carlos A F; Griebler, Dalvan; Fernandes, Luiz Gustavo Em Direção à um Benchmark de Workload Sintético para Paralelismo de Stream em Arquiteturas Multicore Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 171-172, Sociedade Brasileira de Computação (SBC), São Leopoldo, RS, BR, 2016. @inproceedings{MARON:ERAD:16, title = {Em Direção à um Benchmark de Workload Sintético para Paralelismo de Stream em Arquiteturas Multicore}, author = {Carlos A. F. Maron and Dalvan Griebler and Luiz Gustavo Fernandes}, url = {https://gmap.pucrs.br/dalvan/papers/2016/CR_ERAD_PG_2016.pdf}, year = {2016}, date = {2016-04-01}, booktitle = {Escola Regional de Alto Desempenho (ERAD/RS)}, pages = {171-172}, publisher = {Sociedade Brasileira de Computação (SBC)}, address = {São Leopoldo, RS, BR}, abstract = {O processamento de fluxos contínuos de dados (stream) está provocando novos desafios na exploração de paralelismo. Suítes clássicas de benchmarks não exploram totalmente os aspectos de stream, focando-se em problemas de natureza científica e execução finita. Para endereçar este problema em ambientes de memória compartilhada, o trabalho propõe um benchmark de workload sintético voltado para paralelismo stream}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } O processamento de fluxos contínuos de dados (stream) está provocando novos desafios na exploração de paralelismo. Suítes clássicas de benchmarks não exploram totalmente os aspectos de stream, focando-se em problemas de natureza científica e execução finita. Para endereçar este problema em ambientes de memória compartilhada, o trabalho propõe um benchmark de workload sintético voltado para paralelismo stream |
Ledur, Cleverson GMaVis: A Domain-Specific Language for Large-Scale Geospatial Data Visualization Supporting Multi-core Parallelism Masters Thesis Faculdade de Informática - PPGCC - PUCRS, 2016. @mastersthesis{LEDUR:DM:16, title = {GMaVis: A Domain-Specific Language for Large-Scale Geospatial Data Visualization Supporting Multi-core Parallelism}, author = {Cleverson Ledur}, url = {http://tede2.pucrs.br/tede2/handle/tede/6837}, year = {2016}, date = {2016-03-01}, address = {Porto Alegre, Brazil}, school = {Faculdade de Informática - PPGCC - PUCRS}, abstract = {Data generation has increased exponentially in recent years due to the popularization of technology. At the same time, information visualization enables the extraction of knowledge and useful information through data representation with graphic elements. Moreover, a set of visualization techniques may help in information perception, enabling finding patterns and anomalies in data. Even tought it provides many benefits, the information visualization creation is a hard task for users with a low knowledge in computer programming. It becomes more difficult when these users have to deal with big data files since most tools do not provide features to abstract data preprocessing. In order to bridge this gap, we proposed GMaVis. It is a Domain-Specific Language (DSL) that offers a high-level description language for creating geospatial data visualizations through a parallel data preprocessor and a high-level description language. GMaVis was evaluated using two approaches. First we performed a programming effort analysis, using an analytical software to estimate development effort based on the code. This evaluation demonstrates a high gain in productivity when compared with programming effort required by other tools and libraries with similar purposes. Also, a performance evaluation was conducted in the parallel module that performs data preprocessing, which demonstrated a performance gain when compared with the sequential version.}, keywords = {}, pubstate = {published}, tppubtype = {mastersthesis} } Data generation has increased exponentially in recent years due to the popularization of technology. At the same time, information visualization enables the extraction of knowledge and useful information through data representation with graphic elements. Moreover, a set of visualization techniques may help in information perception, enabling finding patterns and anomalies in data. Even tought it provides many benefits, the information visualization creation is a hard task for users with a low knowledge in computer programming. It becomes more difficult when these users have to deal with big data files since most tools do not provide features to abstract data preprocessing. In order to bridge this gap, we proposed GMaVis. It is a Domain-Specific Language (DSL) that offers a high-level description language for creating geospatial data visualizations through a parallel data preprocessor and a high-level description language. GMaVis was evaluated using two approaches. First we performed a programming effort analysis, using an analytical software to estimate development effort based on the code. This evaluation demonstrates a high gain in productivity when compared with programming effort required by other tools and libraries with similar purposes. Also, a performance evaluation was conducted in the parallel module that performs data preprocessing, which demonstrated a performance gain when compared with the sequential version. |
Griebler, Dalvan Domain-Specific Language & Support Tool for High-Level Stream Parallelism PhD Thesis Faculdade de Informática - PPGCC - PUCRS, 2016. @phdthesis{GRIEBLER:PHD:16, title = {Domain-Specific Language & Support Tool for High-Level Stream Parallelism}, author = {Dalvan Griebler}, url = {http://tede2.pucrs.br/tede2/handle/tede/6776}, year = {2016}, date = {2016-06-01}, address = {Porto Alegre, Brazil}, school = {Faculdade de Informática - PPGCC - PUCRS}, abstract = {Stream-based systems are representative of several application domains including video, audio, networking, graphic processing, etc. Stream programs may run on different kinds of parallel architectures (desktop, servers, cell phones, and supercomputers) and represent significant workloads on our current computing systems. Nevertheless, most of them are still not parallelized. Moreover, when new software has to be developed, programmers often face a trade-off between coding productivity, code portability, and performance. To solve this problem, we provide a new Domain-Specific Language (DSL) that naturally/on-the-fly captures and represents parallelism for stream-based applications. The aim is to offer a set of attributes (through annotations) that preserves the program's source code and is not architecture-dependent for annotating parallelism. We used the C++ attribute mechanism to design a ``textitde-facto'' standard C++ embedded DSL named SPar. However, the implementation of DSLs using compiler-based tools is difficult, complicated, and usually requires a significant learning curve. This is even harder for those who are not familiar with compiler technology. Therefore, our motivation is to simplify this path for other researchers (experts in their domain) with support tools (our tool is CINCLE) to create high-level and productive DSLs through powerful and aggressive source-to-source transformations. In fact, parallel programmers can use their expertise without having to design and implement low-level code. The main goal of this thesis was to create a DSL and support tools for high-level stream parallelism in the context of a programming framework that is compiler-based and domain-oriented. Thus, we implemented SPar using CINCLE. SPar supports the software developer with productivity, performance, and code portability while CINCLE provides sufficient support to generate new DSLs. Also, SPar targets source-to-source transformation producing parallel pattern code built on top of FastFlow and MPI. Finally, we provide a full set of experiments showing that SPar provides better coding productivity without significant performance degradation in multi-core systems as well as transformation rules that are able to achieve code portability (for cluster architectures) through its generalized attributes.}, keywords = {}, pubstate = {published}, tppubtype = {phdthesis} } Stream-based systems are representative of several application domains including video, audio, networking, graphic processing, etc. Stream programs may run on different kinds of parallel architectures (desktop, servers, cell phones, and supercomputers) and represent significant workloads on our current computing systems. Nevertheless, most of them are still not parallelized. Moreover, when new software has to be developed, programmers often face a trade-off between coding productivity, code portability, and performance. To solve this problem, we provide a new Domain-Specific Language (DSL) that naturally/on-the-fly captures and represents parallelism for stream-based applications. The aim is to offer a set of attributes (through annotations) that preserves the program's source code and is not architecture-dependent for annotating parallelism. We used the C++ attribute mechanism to design a ``textitde-facto'' standard C++ embedded DSL named SPar. However, the implementation of DSLs using compiler-based tools is difficult, complicated, and usually requires a significant learning curve. This is even harder for those who are not familiar with compiler technology. Therefore, our motivation is to simplify this path for other researchers (experts in their domain) with support tools (our tool is CINCLE) to create high-level and productive DSLs through powerful and aggressive source-to-source transformations. In fact, parallel programmers can use their expertise without having to design and implement low-level code. The main goal of this thesis was to create a DSL and support tools for high-level stream parallelism in the context of a programming framework that is compiler-based and domain-oriented. Thus, we implemented SPar using CINCLE. SPar supports the software developer with productivity, performance, and code portability while CINCLE provides sufficient support to generate new DSLs. Also, SPar targets source-to-source transformation producing parallel pattern code built on top of FastFlow and MPI. Finally, we provide a full set of experiments showing that SPar provides better coding productivity without significant performance degradation in multi-core systems as well as transformation rules that are able to achieve code portability (for cluster architectures) through its generalized attributes. |
Griebler, Dalvan Domain-Specific Language & Support Tool for High-Level Stream Parallelism PhD Thesis Computer Science Department - University of Pisa, 2016. @phdthesis{GRIEBLER:PHD_PISA:16, title = {Domain-Specific Language & Support Tool for High-Level Stream Parallelism}, author = {Dalvan Griebler}, url = {https://gmap.pucrs.br/dalvan/papers/2016/thesis_dalvan_UNIPI_2016.pdf}, year = {2016}, date = {2016-04-01}, address = {Pisa, Italy}, school = {Computer Science Department - University of Pisa}, abstract = {Stream-based systems are representative of several application domains including video, audio, networking, graphic processing, etc. Stream programs may run on different kinds of parallel architectures (desktop, servers, cell phones, and supercomputers) and represent significant workloads on our current computing systems. Nevertheless, most of them are still not parallelized. Moreover, when new software has to be developed, programmers often face a trade-off between coding productivity, code portability, and performance. To solve this problem, we provide a new Domain-Specific Language (DSL) that naturally/on-the-fly captures and represents parallelism for stream-based applications. The aim is to offer a set of attributes (through annotations) that preserves the program's source code and is not architecture-dependent for annotating parallelism. We used the C++ attribute mechanism to design a ``textitde-facto'' standard C++ embedded DSL named SPar. However, the implementation of DSLs using compiler-based tools is difficult, complicated, and usually requires a significant learning curve. This is even harder for those who are not familiar with compiler technology. Therefore, our motivation is to simplify this path for other researchers (experts in their domain) with support tools (our tool is CINCLE) to create high-level and productive DSLs through powerful and aggressive source-to-source transformations. In fact, parallel programmers can use their expertise without having to design and implement low-level code. The main goal of this thesis was to create a DSL and support tools for high-level stream parallelism in the context of a programming framework that is compiler-based and domain-oriented. Thus, we implemented SPar using CINCLE. SPar supports the software developer with productivity, performance, and code portability while CINCLE provides sufficient support to generate new DSLs. Also, SPar targets source-to-source transformation producing parallel pattern code built on top of FastFlow and MPI. Finally, we provide a full set of experiments showing that SPar provides better coding productivity without significant performance degradation in multi-core systems as well as transformation rules that are able to achieve code portability (for cluster architectures) through its generalized attributes.}, keywords = {}, pubstate = {published}, tppubtype = {phdthesis} } Stream-based systems are representative of several application domains including video, audio, networking, graphic processing, etc. Stream programs may run on different kinds of parallel architectures (desktop, servers, cell phones, and supercomputers) and represent significant workloads on our current computing systems. Nevertheless, most of them are still not parallelized. Moreover, when new software has to be developed, programmers often face a trade-off between coding productivity, code portability, and performance. To solve this problem, we provide a new Domain-Specific Language (DSL) that naturally/on-the-fly captures and represents parallelism for stream-based applications. The aim is to offer a set of attributes (through annotations) that preserves the program's source code and is not architecture-dependent for annotating parallelism. We used the C++ attribute mechanism to design a ``textitde-facto'' standard C++ embedded DSL named SPar. However, the implementation of DSLs using compiler-based tools is difficult, complicated, and usually requires a significant learning curve. This is even harder for those who are not familiar with compiler technology. Therefore, our motivation is to simplify this path for other researchers (experts in their domain) with support tools (our tool is CINCLE) to create high-level and productive DSLs through powerful and aggressive source-to-source transformations. In fact, parallel programmers can use their expertise without having to design and implement low-level code. The main goal of this thesis was to create a DSL and support tools for high-level stream parallelism in the context of a programming framework that is compiler-based and domain-oriented. Thus, we implemented SPar using CINCLE. SPar supports the software developer with productivity, performance, and code portability while CINCLE provides sufficient support to generate new DSLs. Also, SPar targets source-to-source transformation producing parallel pattern code built on top of FastFlow and MPI. Finally, we provide a full set of experiments showing that SPar provides better coding productivity without significant performance degradation in multi-core systems as well as transformation rules that are able to achieve code portability (for cluster architectures) through its generalized attributes. |
2015 |
Adornes, Daniel; Griebler, Dalvan; Ledur, Cleverson; Fernandes, Luiz G Coding Productivity in MapReduce Applications for Distributed and Shared Memory Architectures Journal Article doi International Journal of Software Engineering and Knowledge Engineering, 25 (10), pp. 1739-1741, 2015. @article{ADORNES:IJSEKE:15, title = {Coding Productivity in MapReduce Applications for Distributed and Shared Memory Architectures}, author = {Daniel Adornes and Dalvan Griebler and Cleverson Ledur and Luiz G. Fernandes}, url = {http://dx.doi.org/10.1142/S0218194015710096}, doi = {10.1142/S0218194015710096}, year = {2015}, date = {2015-12-01}, journal = {International Journal of Software Engineering and Knowledge Engineering}, volume = {25}, number = {10}, pages = {1739-1741}, publisher = {World Scientific}, abstract = {MapReduce was originally proposed as a suitable and efficient approach for analyzing and processing large amounts of data. Since then, many researches contributed with MapReduce implementations for distributed and shared memory architectures. Nevertheless, different architectural levels require different optimization strategies in order to achieve high-performance computing. Such strategies in turn have caused very different MapReduce programming interfaces among these researches. This paper presents some research notes on coding productivity when developing MapReduce applications for distributed and shared memory architectures. As a case study, we introduce our current research on a unified MapReduce domain-specific language with code generation for Hadoop and Phoenix++, which has achieved a coding productivity increase from 41.84% and up to 94.71% without significant performance losses (below 3%) compared to those frameworks.}, keywords = {}, pubstate = {published}, tppubtype = {article} } MapReduce was originally proposed as a suitable and efficient approach for analyzing and processing large amounts of data. Since then, many researches contributed with MapReduce implementations for distributed and shared memory architectures. Nevertheless, different architectural levels require different optimization strategies in order to achieve high-performance computing. Such strategies in turn have caused very different MapReduce programming interfaces among these researches. This paper presents some research notes on coding productivity when developing MapReduce applications for distributed and shared memory architectures. As a case study, we introduce our current research on a unified MapReduce domain-specific language with code generation for Hadoop and Phoenix++, which has achieved a coding productivity increase from 41.84% and up to 94.71% without significant performance losses (below 3%) compared to those frameworks. |
do Carmo, Andriele Busatto; Raeder, Mateus; Nunes, Thiago; Kolberg, Mariana; Fernandes, Luiz Gustavo A job profile oriented scheduling architecture for improving the throughput of industrial printing environments Journal Article doi Computers & Industrial Engineering, 88 , pp. 191-205, 2015. @article{gmap:CARMO:CIE:15, title = {A job profile oriented scheduling architecture for improving the throughput of industrial printing environments}, author = {Andriele Busatto do Carmo and Mateus Raeder and Thiago Nunes and Mariana Kolberg and Luiz Gustavo Fernandes}, url = {https://doi.org/10.1016/j.cie.2015.07.001}, doi = {10.1016/j.cie.2015.07.001}, year = {2015}, date = {2015-10-01}, journal = {Computers & Industrial Engineering}, volume = {88}, pages = {191-205}, publisher = {Elsevier}, abstract = {The Digital Printing industry has become extremely specialized in the past few years. The use of personalized documents has emerged as a consolidated trend in this field. In order to meet this demand, languages to describe templates for personalized documents were proposed along with procedures which allow the correct printing of such documents. One of these procedures, which demands a high computational effort, is the ripping phase performed over a queue of documents in order to convert them into a printable format. An alternative to decrease the ripping phase computational time is to use high performance computing techniques to allow parallel ripping of different documents. However, such strategies present several unsolved issues. One of the most severe issues is the impossibility to assure a fair load balancing for any job queue. In this scenario, this work proposes a job profile oriented scheduling architecture for improving the throughput of industrial printing environments through a more efficient use of the available resources. Our results show a performance gain of up to 10% in average over the previous existing strategies applied on different job queue scenarios.}, keywords = {}, pubstate = {published}, tppubtype = {article} } The Digital Printing industry has become extremely specialized in the past few years. The use of personalized documents has emerged as a consolidated trend in this field. In order to meet this demand, languages to describe templates for personalized documents were proposed along with procedures which allow the correct printing of such documents. One of these procedures, which demands a high computational effort, is the ripping phase performed over a queue of documents in order to convert them into a printable format. An alternative to decrease the ripping phase computational time is to use high performance computing techniques to allow parallel ripping of different documents. However, such strategies present several unsolved issues. One of the most severe issues is the impossibility to assure a fair load balancing for any job queue. In this scenario, this work proposes a job profile oriented scheduling architecture for improving the throughput of industrial printing environments through a more efficient use of the available resources. Our results show a performance gain of up to 10% in average over the previous existing strategies applied on different job queue scenarios. |
Adornes, Daniel; Griebler, Dalvan; Ledur, Cleverson; Fernandes, Luiz G A Unified MapReduce Domain-Specific Language for Distributed and Shared Memory Architectures Inproceedings doi The 27th International Conference on Software Engineering & Knowledge Engineering, pp. 6, Knowledge Systems Institute Graduate School, Pittsburgh, USA, 2015. @inproceedings{ADORNES:SEKE:15, title = {A Unified MapReduce Domain-Specific Language for Distributed and Shared Memory Architectures}, author = {Daniel Adornes and Dalvan Griebler and Cleverson Ledur and Luiz G. Fernandes}, url = {http://dx.doi.org/10.18293/SEKE2015-204}, doi = {10.18293/SEKE2015-204}, year = {2015}, date = {2015-07-01}, booktitle = {The 27th International Conference on Software Engineering & Knowledge Engineering}, pages = {6}, publisher = {Knowledge Systems Institute Graduate School}, address = {Pittsburgh, USA}, abstract = {MapReduce is a suitable and efficient parallel programming pattern for processing big data analysis. In recent years, many frameworks/languages have implemented this pattern to achieve high performance in data mining applications, particularly for distributed memory architectures (e.g., clusters). Nevertheless, the industry of processors is now able to offer powerful processing on single machines (e.g., multi-core). Thus, these applications may address the parallelism in another architectural level. The target problems of this paper are code reuse and programming effort reduction since current solutions do not provide a single interface to deal with these two architectural levels. Therefore, we propose a unified domain-specific language in conjunction with transformation rules for code generation for Hadoop and Phoenix++. We selected these frameworks as state-of-the-art MapReduce implementations for distributed and shared memory architectures, respectively. Our solution achieves a programming effort reduction from 41.84% and up to 95.43% without significant performance losses (below the threshold of 3%) compared to Hadoop and Phoenix++.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } MapReduce is a suitable and efficient parallel programming pattern for processing big data analysis. In recent years, many frameworks/languages have implemented this pattern to achieve high performance in data mining applications, particularly for distributed memory architectures (e.g., clusters). Nevertheless, the industry of processors is now able to offer powerful processing on single machines (e.g., multi-core). Thus, these applications may address the parallelism in another architectural level. The target problems of this paper are code reuse and programming effort reduction since current solutions do not provide a single interface to deal with these two architectural levels. Therefore, we propose a unified domain-specific language in conjunction with transformation rules for code generation for Hadoop and Phoenix++. We selected these frameworks as state-of-the-art MapReduce implementations for distributed and shared memory architectures, respectively. Our solution achieves a programming effort reduction from 41.84% and up to 95.43% without significant performance losses (below the threshold of 3%) compared to Hadoop and Phoenix++. |
Griebler, Dalvan; Danelutto, Marco; Torquati, Massimo; Fernandes, Luiz G An Embedded C++ Domain-Specific Language for Stream Parallelism Inproceedings doi Parallel Computing: On the Road to Exascale, Proceedings of the International Conference on Parallel Computing, pp. 317-326, IOS Press, Edinburgh, Scotland, UK, 2015. @inproceedings{GRIEBLER:PARCO:15, title = {An Embedded C++ Domain-Specific Language for Stream Parallelism}, author = {Dalvan Griebler and Marco Danelutto and Massimo Torquati and Luiz G. Fernandes}, url = {http://dx.doi.org/10.3233/978-1-61499-621-7-317}, doi = {10.3233/978-1-61499-621-7-317}, year = {2015}, date = {2015-09-01}, booktitle = {Parallel Computing: On the Road to Exascale, Proceedings of the International Conference on Parallel Computing}, pages = {317-326}, publisher = {IOS Press}, address = {Edinburgh, Scotland, UK}, series = {ParCo'15}, abstract = {This paper proposes a new C++ embedded Domain-Specific Language (DSL) for expressing stream parallelism by using standard C++11 attributes annotations. The main goal is to introduce high-level parallel abstractions for developing stream based parallel programs as well as reducing sequential source code rewriting. We demonstrated that by using a small set of attributes it is possible to produce different parallel versions depending on the way the source code is annotated. The performances of the parallel code produced are comparable with those obtained by manual parallelization.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } This paper proposes a new C++ embedded Domain-Specific Language (DSL) for expressing stream parallelism by using standard C++11 attributes annotations. The main goal is to introduce high-level parallel abstractions for developing stream based parallel programs as well as reducing sequential source code rewriting. We demonstrated that by using a small set of attributes it is possible to produce different parallel versions depending on the way the source code is annotated. The performances of the parallel code produced are comparable with those obtained by manual parallelization. |
2019 |
Benchmark Paramétrico para o Domínio do Paralelismo de Stream: Um Estudo de Caso com o Ferret da Suíte PARSEC Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 4, Sociedade Brasileira de Computação (SBC), Três de Maio, BR, 2019. |
Proposta de Grau de Paralelismo Autoadaptativo com MPI-2 para a DSL SPar Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 4, Sociedade Brasileira de Computação (SBC), Três de Maio, BR, 2019. |
Avaliando o Paralelismo de Stream com Pthreads, OpenMP e SPar em Aplicações de Vídeo Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 4, Sociedade Brasileira de Computação (SBC), Três de Maio, BR, 2019. |
Revisando a Programação Paralela com CUDA nos Benchmarks EP e FT Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 4, Sociedade Brasileira de Computação (SBC), Três de Maio, BR, 2019. |
Acelerando o Reconhecimento de Pessoas em Vídeos com MPI Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 4, Sociedade Brasileira de Computação (SBC), Três de Maio, BR, 2019. |
High-Level Stream Parallelism Abstractions with SPar Targeting GPUs Inproceedings doi Parallel Computing is Everywhere, Proceedings of the International Conference on Parallel Computing (ParCo), pp. 543-552, IOS Press, Prague, Czech Republic, 2019. |
Seamless Parallelism Management for Multi-core Stream Processing Inproceedings doi Advances in Parallel Computing, Proceedings of the International Conference on Parallel Computing (ParCo), pp. 533-542, IOS Press, Prague, Czech Republic, 2019. |
Structured Stream Parallelism for Rust Inproceedings doi XXIII Brazilian Symposium on Programming Languages (SBLP), pp. 54-61, ACM, Salvador, Brazil, 2019. |
Minimizing Self-Adaptation Overhead in Parallel Stream Processing for Multi-Cores Inproceedings doi Euro-Par 2019: Parallel Processing Workshops, pp. 12, Springer, Göttingen, Germany, 2019. |
2018 |
High-Level and Productive Stream Parallelism for Dedup, Ferret, and Bzip2 Journal Article doi International Journal of Parallel Programming, 47 (1), pp. 253-271, 2018, ISSN: 1573-7640. |
Stream Parallelism with Ordered Data Constraints on Multi-Core Systems Journal Article doi Journal of Supercomputing, 75 (8), pp. 4042-4061, 2018, ISSN: 0920-8542. |
Efficient NAS Benchmark Kernels with C++ Parallel Programming Inproceedings doi 26th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 733-740, IEEE, Cambridge, UK, 2018. |
Service Level Objectives via C++11 Attributes Inproceedings doi Euro-Par 2018: Parallel Processing Workshops, pp. 745-756, Springer, Turin, Italy, 2018. |
Autonomic and Latency-Aware Degree of Parallelism Management in SPar Inproceedings doi Euro-Par 2018: Parallel Processing Workshops, pp. 28-39, Springer, Turin, Italy, 2018. |
Parallel and Distributed Processing Support for a Geospatial Data Visualization DSL Inproceedings doi Symposium on High Performance Computing Systems (WSCAD), pp. 221-228, IEEE, São Paulo, Brazil, 2018. |
Grau de Paralelismo Adaptativo na DSL SPar Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 2, Sociedade Brasileira de Computação (SBC), Porto Alegre, BR, 2018. |
Uma Suíte de Benchmarks Parametrizáveis para o Domínio de Processamento de Stream em Sistemas Multi-Core Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 2, Sociedade Brasileira de Computação (SBC), Porto Alegre, BR, 2018. |
Proposta de Provisionamento Elástico de Recursos com MPI-2 para a DSL SPar Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 2, Sociedade Brasileira de Computação (SBC), Porto Alegre, BR, 2018. |
Suporte para Computação Autonômica com Elasticidade Vertical para a DSL SPar Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 2, Sociedade Brasileira de Computação (SBC), Porto Alegre, BR, 2018. |
Suporte ao Paralelismo Multi-Core com FastFlow e TBB em uma Aplicação de Alinhamento de Sequências de DNA Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 2, Sociedade Brasileira de Computação (SBC), Porto Alegre, BR, 2018. |
Paralelização de uma Aplicação de Detecção e Eliminação de Ruídos em Streaming de Vídeo com a DSL SPar Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 2, Sociedade Brasileira de Computação (SBC), Porto Alegre, BR, 2018. |
Suporte ao Processamento Paralelo e Distribuído em uma DSL para Visualização de Dados Geoespaciais Inproceedings XIX Simpósio em Sistemas Computacionais de Alto Desempenho, pp. 1-12, SBC, São Paulo, Brazil, 2018. |
Performance of Data Mining, Media, and Financial Applications under Private Cloud Conditions Inproceedings doi 23rd IEEE Symposium on Computers and Communications (ISCC), IEEE, Natal, Brazil, 2018. |
Evaluating, Estimating, and Improving Network Performance in Container-based Clouds Inproceedings doi 23rd IEEE Symposium on Computers and Communications (ISCC), IEEE, Natal, Brazil, 2018. |
The NAS Benchmark Kernels for Single and Multi-Tenant Cloud Instances with LXC/KVM Inproceedings doi International Conference on High Performance Computing & Simulation (HPCS), IEEE, Orléans, France, 2018. |
Parametrização do paralelismo de stream em Benchmarks da suíte PARSEC Masters Thesis School of Technology - PPGCC - PUCRS, 2018. |
Adaptive Degree of Parallelism for the SPar Runtime Masters Thesis School of Technology - PPGCC - PUCRS, 2018. |
2017 |
SPar: A DSL for High-Level and Productive Stream Parallelism Journal Article doi Parallel Processing Letters, 27 (01), pp. 1740005, 2017. |
High-Level and Efficient Stream Parallelism on Multi-core Systems with SPar for Data Compression Applications Inproceedings XVIII Simpósio em Sistemas Computacionais de Alto Desempenho, pp. 16-27, SBC, Campinas, SP, Brasil, 2017. |
Higher-Level Parallelism Abstractions for Video Applications with SPar Inproceedings doi Parallel Computing is Everywhere, Proceedings of the International Conference on Parallel Computing, pp. 698-707, IOS Press, Bologna, Italy, 2017. |
A High-Level DSL for Geospatial Visualizations with Multi-core Parallelism Support Inproceedings doi 41th IEEE Computer Society Signature Conference on Computers, Software and Applications, pp. 298-304, IEEE, Torino, Italy, 2017. |
Towards Distributed Parallel Programming Support for the SPar DSL Inproceedings doi Parallel Computing is Everywhere, Proceedings of the International Conference on Parallel Computing, pp. 563-572, IOS Press, Bologna, Italy, 2017. |
Exploração do Paralelismo em Algoritmos de Mineração de Dados com Pthreads, OpenMP, FastFlow, TBB e Phoenix++ Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 4, Sociedade Brasileira de Computação (SBC), Ijuí, RS, BR, 2017. |
Proposta de uma Plataforma para Experimentos de Software em Programação Paralela Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 4, Sociedade Brasileira de Computação (SBC), Ijuí, RS, BR, 2017. |
Proposta de Implementação de Grau de Paralelismo Adaptativo em uma DSL para Paralelismo de Stream Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 4, Sociedade Brasileira de Computação (SBC), Ijuí, RS, BR, 2017. |
Explorando a Flexibilidade e o Desempenho da Biblioteca FastFlow com o Padrão Paralelo Farm Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 4, Sociedade Brasileira de Computação (SBC), Ijuí, RS, BR, 2017. |
Avaliando a Produtividade e o Desempenho da DSL SPar em uma Aplicação de Detecção de Pistas Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 4, Sociedade Brasileira de Computação (SBC), Ijuí, RS, BR, 2017. |
An Intra-Cloud Networking Performance Evaluation on CloudStack Environment Inproceedings doi 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 5, IEEE, St. Petersburg, Russia, 2017. |
Improving the Network Performance of a Container-Based Cloud Environment for Hadoop Systems Inproceedings doi International Conference on High Performance Computing & Simulation (HPCS), pp. 619-626, IEEE, Genoa, Italy, 2017. |
2016 |
A comparative study of energy-aware scheduling algorithms for computational grids Journal Article doi Journal of Systems and Software, 117 , pp. 153-165, 2016. |
Private IaaS Clouds: A Comparative Analysis of OpenNebula, CloudStack and OpenStack Inproceedings doi 24th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 672-679, IEEE, Heraklion Crete, Greece, 2016. |
Proposta de Suporte a Elasticidade Automática em Nuvem para uma Linguagem Específica de Domínio Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 197-198, Sociedade Brasileira de Computação (SBC), São Leopoldo, RS, BR, 2016. |
Em Direção à um Benchmark de Workload Sintético para Paralelismo de Stream em Arquiteturas Multicore Inproceedings Escola Regional de Alto Desempenho (ERAD/RS), pp. 171-172, Sociedade Brasileira de Computação (SBC), São Leopoldo, RS, BR, 2016. |
GMaVis: A Domain-Specific Language for Large-Scale Geospatial Data Visualization Supporting Multi-core Parallelism Masters Thesis Faculdade de Informática - PPGCC - PUCRS, 2016. |
Domain-Specific Language & Support Tool for High-Level Stream Parallelism PhD Thesis Faculdade de Informática - PPGCC - PUCRS, 2016. |
Domain-Specific Language & Support Tool for High-Level Stream Parallelism PhD Thesis Computer Science Department - University of Pisa, 2016. |
2015 |
Coding Productivity in MapReduce Applications for Distributed and Shared Memory Architectures Journal Article doi International Journal of Software Engineering and Knowledge Engineering, 25 (10), pp. 1739-1741, 2015. |
A job profile oriented scheduling architecture for improving the throughput of industrial printing environments Journal Article doi Computers & Industrial Engineering, 88 , pp. 191-205, 2015. |
A Unified MapReduce Domain-Specific Language for Distributed and Shared Memory Architectures Inproceedings doi The 27th International Conference on Software Engineering & Knowledge Engineering, pp. 6, Knowledge Systems Institute Graduate School, Pittsburgh, USA, 2015. |
An Embedded C++ Domain-Specific Language for Stream Parallelism Inproceedings doi Parallel Computing: On the Road to Exascale, Proceedings of the International Conference on Parallel Computing, pp. 317-326, IOS Press, Edinburgh, Scotland, UK, 2015. |