This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Last revision Both sides next revision | ||
ffnamespace:performance [2014/08/31 02:46] aldinuc |
ffnamespace:performance [2014/08/31 02:51] aldinuc |
||
---|---|---|---|
Line 1: | Line 1: | ||
===== Applications and Performances ===== | ===== Applications and Performances ===== | ||
- | ==== NGS tools (Bowtie2, BWA) ==== | + | ==== NGS tools (Bowtie2, BWA) - 2014 ==== |
Bowtie2.0.6, Bowtie-2.2.1, and BWA compared in performance against their porting onto the FastFlow library. Tested on | Bowtie2.0.6, Bowtie-2.2.1, and BWA compared in performance against their porting onto the FastFlow library. Tested on | ||
* Intel 4-socket 8-core Nehalem (64 HT) @2.0GHz, 72MB L3, 64 GB mem, Linux x86_64 | * Intel 4-socket 8-core Nehalem (64 HT) @2.0GHz, 72MB L3, 64 GB mem, Linux x86_64 | ||
Line 12: | Line 12: | ||
|{{:ffnamespace:bowtie2-speedup.png?300|}}|{{:ffnamespace:bowtie-bwa-maxspeedup.png?300|}}| | |{{:ffnamespace:bowtie2-speedup.png?300|}}|{{:ffnamespace:bowtie-bwa-maxspeedup.png?300|}}| | ||
+ | ==== Yadt-ff (parallel C4.5) - 2012 ==== | ||
+ | The well-known C4.5 statistical classifier is a double hard algorithm. First of all, because data-miners simply would not like to spend time on a yet another brand new parallel version :-) Many past experiences demonstrated that tiny improvements of the sequential algorithm could bring much more performance than a robust investment on parallelization. This clearly does not absolutely mean that parallelization is useless, but, at least in our understanding, that a low-effort and conservative parallelization is the only fairly welcome parallelization in the data-mining community. Unfortunately that kind of parallelization, i.e. loop and recursion parallelization, is technically complex because independent tasks generated in this way may exhibit several non nice proprieties, including a huge range of variability in the task size that in turn may induce both severe synchronization overheads and non-trivial load balancing problems that limit the speedup. | ||
+ | The YaDT-FastFlow application faces both problems. [[http://ieeexplore.ieee.org/Xplore/login.jsp?url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F9460%2F30023%2F01374196.pdf|YaDT]] is a third-party, main-memory implementation of the C4.5-like decision tree algorithm by Salvatore Ruggieri. YaDT-FastFlow is a //low-effort// parallelization of the sequential algorithm that required less than 10 hours of development (including tuning and testing) while producing a significant speedup over the sequential version. | ||
+ | |||
+ | This application aims at demonstrating the ability of FastFlow and FastFlow accelerator to support rapid and efficient development via semi-automatic parallelization of loops and Divide&Conquer in third-party and legacy codes. | ||
+ | |||
+ | Stay tuned for a brand new Technical Report about that. The code will be publicly available with the Technical Report. The C.4.5-FastFlow application has been developed in cooperation with Salvatore Ruggieri, University of Pisa, Italy. | ||
+ | |||
+ | === Performances === | ||
+ | Tests on andromeda (2 x quad-core HT - 16 contexts, Linux) and ottavinareale (2 x quad-core, Linux). | ||
+ | |||
+ | |{{:ffnamespace:model_cr2_speedup.png?320|Speedup on ottavinareale}}|{{:ffnamespace:ottavina_cr2_speedup.png?320|Speedup on andromeda}}| | ||
+ | | On Andromeda (HT, 8 cores, 16 contexts) | On Ottavinareale (8 cores) | | ||
<note important> | <note important> | ||
Line 124: | Line 137: | ||
- | ==== Yadt-ff (parallel C4.5) ==== | ||
- | The well-known C4.5 statistical classifier is a double hard algorithm. First of all, because data-miners simply would not like to spend time on a yet another brand new parallel version :-) Many past experiences demonstrated that tiny improvements of the sequential algorithm could bring much more performance than a robust investment on parallelization. This clearly does not absolutely mean that parallelization is useless, but, at least in our understanding, that a low-effort and conservative parallelization is the only fairly welcome parallelization in the data-mining community. Unfortunately that kind of parallelization, i.e. loop and recursion parallelization, is technically complex because independent tasks generated in this way may exhibit several non nice proprieties, including a huge range of variability in the task size that in turn may induce both severe synchronization overheads and non-trivial load balancing problems that limit the speedup. | ||
- | The YaDT-FastFlow application faces both problems. [[http://ieeexplore.ieee.org/Xplore/login.jsp?url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F9460%2F30023%2F01374196.pdf|YaDT]] is a third-party, main-memory implementation of the C4.5-like decision tree algorithm by Salvatore Ruggieri. YaDT-FastFlow is a //low-effort// parallelization of the sequential algorithm that required less than 10 hours of development (including tuning and testing) while producing a significant speedup over the sequential version. | ||
- | |||
- | This application aims at demonstrating the ability of FastFlow and FastFlow accelerator to support rapid and efficient development via semi-automatic parallelization of loops and Divide&Conquer in third-party and legacy codes. | ||
- | |||
- | Stay tuned for a brand new Technical Report about that. The code will be publicly available with the Technical Report. The C.4.5-FastFlow application has been developed in cooperation with Salvatore Ruggieri, University of Pisa, Italy. | ||
- | |||
- | === Performances === | ||
- | Tests on andromeda (2 x quad-core HT - 16 contexts, Linux) and ottavinareale (2 x quad-core, Linux). | ||
- | |||
- | |{{:ffnamespace:model_cr2_speedup.png?320|Speedup on ottavinareale}}|{{:ffnamespace:ottavina_cr2_speedup.png?320|Speedup on andromeda}}| | ||
- | | On Andromeda (HT, 8 cores, 16 contexts) | On Ottavinareale (8 cores) | | ||
==== Smith-Waterman ==== | ==== Smith-Waterman ==== | ||
In bioinformatics, sequence database searches are used to find the | In bioinformatics, sequence database searches are used to find the |