User Tools

Site Tools


Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
ffnamespace:performance [2013/12/29 18:43]
aldinuc
ffnamespace:performance [2014/08/31 02:46]
aldinuc
Line 1: Line 1:
 +===== Applications and Performances ===== 
 +==== NGS tools (Bowtie2, BWA) ====
 +Bowtie2.0.6,​ Bowtie-2.2.1,​ and BWA compared in performance against their porting onto the FastFlow library. Tested on   
 +     * Intel 4-socket 8-core Nehalem (64 HT) @2.0GHz, 72MB L3, 64 GB mem, Linux x86_64
 +     * Intel 2-socket 8-core Sandy Bridge (32 HT) @2.2GHz, 40MB L3, 64 GB mem, Linux x86_64 ​
 +
 +More details in:
 +
 +  * C. Misale, “Accelerating Bowtie2 with a lock-less concurrency approach and memory affinity,​” in Proc. of Intl. Euromicro PDP 2014: Parallel Distributed and network-based Processing, Torino, Italy, 2014. doi:​10.1109/​PDP.2014.50 [[http://​calvados.di.unipi.it/​storage/​paper_files/​2014_pdp_bowtieff.pdf|PDF]]
 +  * C. Misale, G. Ferrero, M. Torquati, and M. Aldinucci, “Sequence alignment tools: one parallel pattern to rule them all?,” BioMed Research International,​ 2014. doi:​10.1155/​2014/​539410 [[http://​downloads.hindawi.com/​journals/​bmri/​2014/​539410.pdf|PDF]] ​
 +
 +|{{:​ffnamespace:​bowtie2-speedup.png?​300|}}|{{:​ffnamespace:​bowtie-bwa-maxspeedup.png?​300|}}|
 +
 +
 +
 <note important>​ <note important>​
-Work in progress+The rest is outdated
 </​note>​ </​note>​
  
-===== Applications and Performances ===== +
 We have been developing several applications using FastFlow and FastFlow accelerator. The complexity of them ranges from simple micro-benchmarks to quite complex scientific and business applications. Clearly, our main business consists in developing FastFlow itself more than any big or complex application. However, we believe that developing and running applications is the only effective way to demonstrate that FastFlow is a viable and convenient way to high-level parallel programming for multi-core. For this, each application is carefully chosen in order to demonstrate a particular aspect of feature of FastFlow, and we try make them timeliness available to third-parties with all the information needed to understand the code and reproduce the experiments we did. We also try to publish a Technical Report for each significant advance. Said that, we are also very interested to support independent programmers,​ scientists, and industries that would like to try FastFlow on their own applicative domains. If you interested just write us.  We have been developing several applications using FastFlow and FastFlow accelerator. The complexity of them ranges from simple micro-benchmarks to quite complex scientific and business applications. Clearly, our main business consists in developing FastFlow itself more than any big or complex application. However, we believe that developing and running applications is the only effective way to demonstrate that FastFlow is a viable and convenient way to high-level parallel programming for multi-core. For this, each application is carefully chosen in order to demonstrate a particular aspect of feature of FastFlow, and we try make them timeliness available to third-parties with all the information needed to understand the code and reproduce the experiments we did. We also try to publish a Technical Report for each significant advance. Said that, we are also very interested to support independent programmers,​ scientists, and industries that would like to try FastFlow on their own applicative domains. If you interested just write us. 
  
Line 15: Line 30:
  
    
-{{:​ffnamespace:​50.png?​320|}} +|{{:​ffnamespace:​50.png?​320|}}|{{:​ffnamespace:​5.png?​320|}}| 
-{{:​ffnamespace:​5.png?​320|}} +|{{:​ffnamespace:​05.png?​320|}}|{{:​ffnamespace:​microbench.png?​320|}}|
-{{:​ffnamespace:​05.png?​320|}} +
-{{:​ffnamespace:​microbench.png?​320|}}+
  
 === Performances:​ FastFlow vs Intel TBB vs OpenMP vs Cilk === === Performances:​ FastFlow vs Intel TBB vs OpenMP vs Cilk ===
 Tests on ottavinareale (8-cores, Linux) Tests on ottavinareale (8-cores, Linux)
  
-{{:​ffnamespace:​sw_ff_tbb_omp_cilk_50.png?​320|}} +|{{:​ffnamespace:​sw_ff_tbb_omp_cilk_50.png?​320|}}|{{:​ffnamespace:​sw_ff_tbb_omp_cilk_5.png?​320|}}| 
-{{:​ffnamespace:​sw_ff_tbb_omp_cilk_5.png?​320|}} +|{{:​ffnamespace:​sw_ff_tbb_omp_cilk_05.png?​320|}}| |
-{{:​ffnamespace:​sw_ff_tbb_omp_cilk_05.png?​320|}}+
  
 ==== N-Queens ​ ==== ==== N-Queens ​ ====
Line 302: Line 314:
  
 {{:​ffnamespace:​iphone-2012.06.27-14.18.40.png?​240|}} {{:​ffnamespace:​iphone-2012.06.27-14.18.40.png?​240|}}
- 
- 
-====== Platforms ====== 
- 
-==== Andromeda ==== 
-Andromeda is an Intel  workstation with 2 quad-core Xeon E5520 Nehalem (16 HyperThreads) @2.26GHz with 8MB L3 cache and 24 GBytes of main memory. The platform implements Quickpath processor interconnect equipped with an extended version of MESI cache coherence protocol: a new read-only forward state has also been introduced to enable cache-to-cache clean line forwarding. This eliminates invalidations in the case of read-only sharing that significantly simplifies the performance tuning of the FastFlow code. Courtesy of University of Pisa. 
- 
-==== Ottavarinareale ==== 
- 
-[[http://​cotognata.di.unipi.it/​~marcodanelutto/​wiki/​doku.php?​id=regoleottavina|Ottavinareale]] is shared memory Intel platform with two quad-core Xeon E5420 Harpertown 2.5GHz 6MB L2 cache and 8 GBytes of main memory, a Linux CentOS release 5.2  2.6.18-92.1.22.el5,​ and gcc version 4.1.2 with POSIX thread model. Courtesy of University of Pisa. 
- 
-==== Biocluster ==== 
-Courtesy of University of Torino. 
- 
-==== Magnana ​ ===== 
-Magnana is a Macbook 13''​ unibody with Core 2 Duo P8600 2.4GHz 3MB L2 cache and 4GBytes of main memory. It currently runs Mac OS X 10.6.2 Snow Leopard, Macports QT 4.6.1 (well, it is just one of our laptops). Courtesy of University of Torino. 
- 
-==== Calvados ​ ==== 
- 
-Calvados is a Power Macintosh G4 (Mirrored Drive Doors) with 2x 1.25 GHz PowerPC G4 (7455 v3.2), 256KBytes L2, 1MBytes L3, and 1.5 GBytes of memory. It currently runs Mac OS X 10.5.8 Leopard. Calvados is particularly important as testing platform because it is equipped with processors that have a weaker memory consistency than Intel core platforms. Courtesy of University of Pisa. 
ffnamespace/performance.txt · Last modified: 2014/08/31 02:52 by aldinuc