This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | |||
ffnamespace:performance [2014/08/31 02:51] aldinuc |
ffnamespace:performance [2014/08/31 02:52] (current) aldinuc |
||
---|---|---|---|
Line 1: | Line 1: | ||
===== Applications and Performances ===== | ===== Applications and Performances ===== | ||
- | ==== NGS tools (Bowtie2, BWA) - 2014 ==== | + | ==== [2014] NGS tools (Bowtie2, BWA) ==== |
Bowtie2.0.6, Bowtie-2.2.1, and BWA compared in performance against their porting onto the FastFlow library. Tested on | Bowtie2.0.6, Bowtie-2.2.1, and BWA compared in performance against their porting onto the FastFlow library. Tested on | ||
* Intel 4-socket 8-core Nehalem (64 HT) @2.0GHz, 72MB L3, 64 GB mem, Linux x86_64 | * Intel 4-socket 8-core Nehalem (64 HT) @2.0GHz, 72MB L3, 64 GB mem, Linux x86_64 | ||
Line 12: | Line 12: | ||
|{{:ffnamespace:bowtie2-speedup.png?300|}}|{{:ffnamespace:bowtie-bwa-maxspeedup.png?300|}}| | |{{:ffnamespace:bowtie2-speedup.png?300|}}|{{:ffnamespace:bowtie-bwa-maxspeedup.png?300|}}| | ||
- | ==== Yadt-ff (parallel C4.5) - 2012 ==== | + | ==== [2012] Yadt-ff (parallel C4.5) ==== |
The well-known C4.5 statistical classifier is a double hard algorithm. First of all, because data-miners simply would not like to spend time on a yet another brand new parallel version :-) Many past experiences demonstrated that tiny improvements of the sequential algorithm could bring much more performance than a robust investment on parallelization. This clearly does not absolutely mean that parallelization is useless, but, at least in our understanding, that a low-effort and conservative parallelization is the only fairly welcome parallelization in the data-mining community. Unfortunately that kind of parallelization, i.e. loop and recursion parallelization, is technically complex because independent tasks generated in this way may exhibit several non nice proprieties, including a huge range of variability in the task size that in turn may induce both severe synchronization overheads and non-trivial load balancing problems that limit the speedup. | The well-known C4.5 statistical classifier is a double hard algorithm. First of all, because data-miners simply would not like to spend time on a yet another brand new parallel version :-) Many past experiences demonstrated that tiny improvements of the sequential algorithm could bring much more performance than a robust investment on parallelization. This clearly does not absolutely mean that parallelization is useless, but, at least in our understanding, that a low-effort and conservative parallelization is the only fairly welcome parallelization in the data-mining community. Unfortunately that kind of parallelization, i.e. loop and recursion parallelization, is technically complex because independent tasks generated in this way may exhibit several non nice proprieties, including a huge range of variability in the task size that in turn may induce both severe synchronization overheads and non-trivial load balancing problems that limit the speedup. | ||