FastFlow (v2.0)

FastFlow (斋戒流) is a parallel programming framework for multi-core platforms based upon non-blocking lock-free/fence-free synchronization mechanisms. The framework is composed of a stack of layers that progressively abstracts out the programming of shared-memory parallel applications. The goal of the stack is twofold: to ease the development of applications and make them very fast and scalable. FastFlow is particularly targeted to the development of streaming applications.

Vanilla or other flavours. In FastFlow, different layers are targeted to support different kinds of programmer. FastFlow can be directly used to set up an arbitrary network of parallel activities (low-level programming layer); at this level, similar to what happens when programming with POSIX threads, any orchestration of parallel activities can be expressed. However, as for POSIX threads, writing a correct and efficient program is a non-trivial activity. As shown in [ADM09], FastFlow synchronizations are usually faster than POSIX ones. At the next layer up (high-level programming layer), FastFlow provides programmers with a number of pre-defined parametric programming patterns (i.e. skeletons); at this level, similar to what happens when programming with Intel TBB, some orchestration of parallel activities can be expressed: programs are composed by configuring and combining patterns (skeletons), which have an optimised implementation. Writing a correct and efficient program is relatively easy (also see tutorial page). As shown in [AMT09], FastFlow-based applications have a speed edge on TBB-based ones.

The FastFlow high-level skeletal layer can be further abstracted (using skeletons as object factories) to define Problem Solving Environments (PSEs), which are programming frameworks designed to ease the development of efficient parallel applications in a specific domain. As an example, we are currently working on the following PSEs:

  • FastFlow accelerator and offloading (completed);
  • Parallel Monte Carlo and Gillespie simulations (ongoing);
  • Parallel macro data-flow interpretation with automatic parallelization features supporting skeletal programming (ongoing);
  • An extension of Intel TBB supporting general streaming networks (ongoing);
  • A (blazing fast) parallel memory allocator that is faster than hoard and TBB allocators (completed).

The three described layers are intended for three kinds of user, respectively: FastFlow designers, skilled programmers (with some knowledge of parallel programming), and casual programmers (e.g. application domain experts). See architecture page for further information. Completed PSEs can be found in the project SVN (not the release tarball), whereas documentation is still incomplete and will appear soon on this site.

FastFlow is fast. We experimentally demonstrate that FastFlow is as fast as or in some case faster than state-of-the-art multi-core programming frameworks like Cilk TBB and OpenMP.

License. FastFlow is implemented as a C++ template library, which is open source and released under LGPLv3

Downloads and contacts

Last stable version of FastFlow can be downloaded from sourceforge at Download, beta releases are available via sourceforge SVN server. FastFlow 2.0 is now available

Get FastFlow: programming multi-cores at SourceForge.net. Fast, secure and Free Open Source software downloads

News and releases are announced on the sourceforge developer forum; important news are recap on this page.

Questions and comments can be posted on FastFlow mailing list fastflow[AT]di.unito.it. Interested users can subscribe to it (very low traffic, moderated).

News

  • 13 Apr 2013 FastFlow 2.0.1 has been committed to the Sourceforge SVN. Soon we are going to put on Sourceforge a new tarball.
  • 10 Jul 2012 FastFlow 2.0 is now on sourceforge SVN. It includes the distributed version and several new applications (including denoiser). The release is almost stable for Linux. Several features are still beta for non-linux platforms.
  • 26 June 2012 FastFlow is now working on iOS 5.x (iPhone/Ipad). Looking for people willing to test it with a real world application (e.g. a game). Here a screenshot.
  • 18 May 2012 FastFlow is going to support clusters of SMPs. Distributed version is now under test. Still working on GPUs.
  • 16 May 2012 The uSPSC component of FastFlow, i.e. the unbound wait-free SPSC queue will be presented at EuroPar 2012 in Rhodes. It is fast.
  • 25 Mar 2012 Prof. Marco Danelutto included FastFlow in the teaching material of the Distributed systems: paradigms and models course held within the Master on Computer Science and Networking at University of Pisa, Italy. He also wrote a programming primer. Please send to us any comment, question to improve it.
  • 05 Jan 2012 Working on GPUs We are working on the integration of FastFlow self-offloading with GPUs. We have some encouraging preliminary results. Stay tuned.
  • 05 Jan 2011 FastFlow is now part of the IMPACT project (Innovative Methods for Particle Colliders at the Terascale, founded by Compagnia di San Paolo, with CERN, through a competition with 11% of acceptance rate). We are working at a parallel, high-performance simulation tool for QCD background modeling (e.g. for Higgs boson detection).
  • 30 May 2011 FastFlow has won the HPC Advisory Council best project Award (Spring 2011, assigned twice a year). To be announced at International Supercomputing 2011.
  • 24 May 2011 FastFlow is now part of the ParaPhrase EU-STREP/FWP7 project: Parallel Patterns for Adaptive Heterogeneous Multicore Systems (starting Oct. 2011).
  • 23 Apr. 2011 FastFlow 1.1 is now available. Many new features (see changelog file within tarball). Now working on almost any recent 32/64bit Linux, 32/64bit MacOS 10.4,10.5,10.6, 32/64bit Windows XP and 7 (with Visual Studio). Cmake now generate Makefile, Xcode and Visual Studio projects.
  • 25 Mar. 2011 First Windows native port is now ready (FastFlow 1.0.9, available on sourceforge svn), currently tested on Windows 7 x64 with Visual Studio 10. Looking for beta-testers, write us if you are interested in trying it.
  • 17 Mar. 2011 — TR-11-16: Porting Decision Tree Building and Pruning Algorithms to Multicore using FastFlow. To the best of our knowledge, the first implementation of parallel pruning of decision trees in data mining literature.
  • 7 Jan. 2011 — We are preparing to reverse many new features in the new version of FastFlow; some of them are already in the Sourceforge SVN. Also, we got interesting new applications: take a look to Edge-preserving image denoiser and its impressive performance.
  • 15 Oct. 2010 — We started a FAQ page. Any further question or comment is welcome.
  • 12 Oct. 2010 — A new paper and a new presentation are available on the website.
  • 1 Oct. 2010 — Due to the many requests we have begun to work to a FastFlow programming tutorial. A complete tutorial is not yet available, however, if you would like try FastFlow, we suggest to 1) Understand the architecture looking to this website, papers and talk slides available on this site. 2) Look at the tutorial page in this web site. 3) Start playing with examples on tests directory; the tests are designed to be progressive in complexity. 3) Write to fastflow[AT]di.unito.it in case of any problem.
  • 1 Oct. 2010 — It seems we will have a windows version soon. Stay tuned.
  • 7 Sep. 2010 — A porting onto Windows/Visual Studio platform is ongoing (designed, development ongoing). We are looking for developers and beta testers.
  • 1 Sep. 2010FastFlow 1.0 is now ready. Working on any Linux (i386, x86_64) and any MacOS > 10.2 (PPC, i368, x86_64).

Changelog and project news history

Who is Using FastFlow

  • Fix8 an extremely fast C++ Open Source FIX framework.
  • CWCsimulator (Calculus of Wrapped Compartments), a rewriting-based calculus for the representation and simulation of biological systems.
  • Peafowl a flexible and extensible Deep Packet Inspection (DPI) framework.
  • The ParaPhrase project aiming at developing a new structured design and implementation process for heterogeneous parallel architectures.
  • YaDT a very efficient C++ implementation of the entropy-based tree construction algorithm.

Working with us

Three months internship available with HiPEAC EU NoE support for HiPEAC affiliated members (5000 Euro). Deadline May 1st, 2013. Write to Marco Aldinucci (aldinuc@di.unito.it)

People

FastFlow has been initially designed by:

Several other people contributed to the design, development, and application development. Among them:

Would you like to join us? Just write us and tell us your ideas and skills (aldinuc AT di.unito.it, torquati AT di.unipi.it). Master thesis on FastFlow are available at University of Pisa, Torino, and wherever you are :-)

Manuals and Tutorials

[Tor12] Massimo Torquati. FastFlow: targeting distributed systems, Talk at Paraphrase meeting, Pisa, July 2012.
[ADT12] Marco Aldinucci, Marco Danelutto, and Massimo Torquati. FastFlow tutorial, Technical Report TR-12-04, Computer Science Departement, University of Pisa, March 2012.
[ADK11b] Marco Aldinucci, Marco Danelutto, Peter Kilpatrick, and Massimo Torquati. FastFlow: high-level and efficient streaming on multi-core. (A FastFlow short tutorial), in: Programming Multi-core and Many-core Computing Systems, Parallel and Distributed Computing, chapter 13. Wiley, 2013.
API documentation. Coming soon.

Papers

Overview (general reference)

[ADK11b] Marco Aldinucci, Marco Danelutto, Peter Kilpatrick, and Massimo Torquati. FastFlow: high-level and efficient streaming on multi-core, in: Programming Multi-core and Many-core Computing Systems, Parallel and Distributed Computing, chapter 13. Wiley, 2013. bib

Methodology and theory

[ACD13] Marco Aldinucci, Sonia Campa, Marco Danelutto, Peter Kilpatrick, and Massimo Torquati. Targeting distributed systems in fastflow. In Euro-Par 2012 Workshops, Proc. of the CoreGrid Workshop on Grids, Clouds and P2P Computing, LNCS. Springer, 2013. To appear.
[ADK12b] Marco Aldinucci, Marco Danelutto, Peter Kilpatrick, Massimiliano Meneghin, and Massimo Torquati. An Efficient Unbounded Lock-Free Queue for Multi-Core Systems, in: Proc. of Euro-Par 2012, Rhodes Island, Greece. LNCS. Aug 2012. To appear. bib
[AAD12] Marco Aldinucci, Lorenzo Anardu, Marco Danelutto, and Massimo Torquati. Parallel patterns + Macro Data Flow for multi-core programming, in: Proc. of Intl. Euromicro PDP 2012: Parallel Distributed and network-based Processing. IEEE. February 2012. bib
[ADK12a] Marco Aldinucci, Marco Danelutto, Peter Kilpatrick, and Massimo Torquati. Targeting heterogeneous architectures via macro data flow, Parallel Processing Letters, 22(2), June 2012.
[ADK11a] Marco Aldinucci, Marco Danelutto, Peter Kilpatrick, Massimiliano Meneghin, and Massimo Torquati. Accelerating code on multi-cores with FastFlow, in: Proc. of Euro-Par 2011, Bordeaux, France. September 2011. (30% acceptance rate) bib

Applications, comparison against other frameworks

[ASD12] Marco Aldinucci, Concetto Spampinato, Maurizio Drocco, Massimo Torquati, and Simone Palazzo. A Parallel Edge Preserving Algorithm for Salt and Pepper Image Denoising, in: Proc. of Intl. Conference on Image Processing Theory, Tools and Applications (IPTA). IEEE. Oct 2012. To appear.
[DDD11] Marco Danelutto, Luca Deri, Daniele De Sensi. Network Monitoring on Multicores with Algorithmic Skeletons, in: Proc. of Intl. Parallel Computing (PARCO), September 2011.
[ACD11b] Marco Aldinucci, Mario Coppo, Ferruccio Damiani, Maurizio Drocco, Eva Sciacca, Salvatore Spinella, Massimo Torquati, and Angelo Troina. On Parallelizing On-Line Statistics for Stochastic Biological Simulations, in: Proc. of the 2nd Workshop on High Performance Bioinformatics and Biomedicine (HiBB, in conjunction with Euro-Par 2011), LNCS, Bordeaux, France, September 2011. Springer. bib
[ART11] Marco Aldinucci, Salvatore Ruggieri, and Massimo Torquati. Porting decision tree building and pruning algorithms to multicore using FastFlow, Università di Pisa, Dipartimento di Informatica, Italy, number TR-11-06, March 2011. bib
[ACD11a] Marco Aldinucci, Mario Coppo, Ferruccio Damiani, Maurizio Drocco, Massimo Torquati, and Angelo Troina. On Designing Multicore-aware Simulators for Biological Systems, in: Proc. of Intl. Euromicro PDP 2011: Parallel Distributed and network-based Processing. IEEE. February 2011. bib
[ART10] Marco Aldinucci, Salvatore Ruggieri, and Massimo Torquati. Porting Decision Tree Algorithms to Multicore using FastFlow, in: Proc. of European Conference in Machine Learning and Knowledge Discovery in Databases (ECML PKDD), volume 6321 of LNCS, pages 7–23, Barcelona, Spain, September 2010. Springer. (18% acceptance rate)bib
[ABL10] Marco Aldinucci, Andrea Bracciali, Pietro Lio'. Formal Synthetic Immunology, Ercim News 82:40–41, July 2010. bib
[ABL10] Marco Aldinucci, Andrea Bracciali, Pietro Lio', Anil Sorathiya, and Massimo Torquati. StochKit-FF: Efficient Systems Biology on Multicore Architectures, in: Proc. of the 1st Workshop on High Performance Bioinformatics and Biomedicine (HiBB, in conjunction with Euro-Par 2010), volume 6586 of LNCS, pages 167–175, Ischia, Italy, September 2010. Springer. bib
[AMT09] Marco Aldinucci, Massimiliano Meneghin, and Massimo Torquati. Efficient Smith-Waterman on multi-core with FastFlow, in: Proc. of Intl. Euromicro PDP 2010: Parallel Distributed and network-based Processing. IEEE. February 2010.bib
[ADM09] Marco Aldinucci, Marco Danelutto, Massimiliano Meneghin, Peter Kilpatrick, and Massimo Torquati. Efficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed, in: Proc. of Intl. Parallel Computing (PARCO), September 2009. bib

In the ParaPhrase project

[ACT12] Marco Aldinucci, Sonia Campa, Fabio Tordini, Massimo Torquati, and Peter Kilpatrick. An abstract annotation model for skeletons, in Formal Methods for Components and Objects: Intl. Symposium, FMCO 2011, Torino, Italy, October 3–5, 2011, Revised Invited Lectures, LNCS. Springer, 2012. To appear.
[ADK12c] Marco Aldinucci, Marco Danelutto, Peter Kilpatrick, Carlo Montangero, and Laura Semini. Managing adaptivity in parallel systems, in Formal Methods for Components and Objects: Intl. Symposium, FMCO 2011, Torino, Italy, October 3–5, 2011, Revised Invited Lectures, LNCS. Springer, 2012. To appear.
[HAB12] Kevin Hammond, Marco Aldinucci, Chris Brown, Francesco Cesarini, Marco Danelutto, Horacio González-Vélez, Peter Kilpatrick, Rainer Keller, M. Rossbory, and Gilad Shainer. The paraphrase project: Parallel patterns for adaptive heterogeneous multicore systems, in Formal Methods for Components and Objects: Intl. Symposium, FMCO 2011, Torino, Italy, October 3–5, 2011, Revised Invited Lectures, LNCS. Springer, 2012. To appear.

Speculations, future directions, ideas

[TAT12] Fabio Tordini, Marco Aldinucci and Massimo Torquati. High-level lock-less programming for multicore, Poster at Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems (ACACES HiPEAC summer school). July 2012.

Talks

Would like to know more about FastFlow? Invite us for a talk at your university or company!

FastFlow: high-level programming patterns with non-blocking lock-free run-time support. Politecnico di Milano, Dipartimento di Elettronica ed informazione, Milano, Italy, December 2012.
A Parallel Edge Preserving Algorithm for Salt and Pepper Image Denoising. Intl. Conference on Image Processing Theory Tools and Applications (IPTA), Istambul, Turkey, October 2012.
Turning Big data into knowledge: Techniques and Tools for Parallel Computing on Online Data Streams in Systems Biology and Epidemiology. Invited talk given at BioIt World conference, Vienna, Austria. October 2012.
FastFlow: high-level programming patterns with non-blocking lock-free run-time support. Invited talk given at UPMARC Workshop on Task-Based Parallel Programming, Uppsala, Sweden. September 2012.
An efficient Unbounded Lock-Free Queue for Multi-Core Systems. Talk given at Euro-Par 2012, Rhodes Island, Greece. August 2012.
Targeting distributed systems in FastFlow. Talk given at CoreGrid Symposium (co-located with Euro-Par 2012), August 2012.
Pattern-based Parallel Edge Preserving Algorithm for Salt-and-Pepper Image Denoising. Talk given at HPC Advisory Council Swiss Workshop, March 2012. Video of the talk from InsideHPC (including several comments on theory of synchronization in the shared memory)
Accelerating code on multi-cores with FastFlow. Talk given at Euro-Par 2011, Bordeaux, France. September 2011.
On Parallelizing On-Line Statistics for Stochastic Biological Simulations. Talk given at HiBB 2011: High Performance Bioinformatics and Biomedicine, Bordeaux, France. September 2011.
On Designing Multicore-Aware Simulators for Biological Systems. Talk given at IEEE PDP 2011: Parallel Distributed and network-based Processing, Ayia Napa, Ciprus. February 2011.
Skeletons from grids to multicores. Invited talk at Vrije University, Amsterdam, The Netherlands. October 2010.
Porting Decision Tree Algorithms to Multicore Using FastFlow. Extended version of the slides presented at ECML-PKDD 2010, Barcelona, Spain. September 2010.
FastFlow: a pattern-based programming framework for multicores. Dagstuhl seminar 10191, Schloss Dagstuhl, Germany. May 2010. Invited. Slides available on-demand.
FastFlow: why we need yet another programming framework. Guest lecture, Computer science Dept. Queen’s University Belfast, UK. March 2010. Invited within Master Course CSC4005. Slides available on-demand.
Efficient Smith-Waterman on multi-core with FastFlow. Talk given at IEEE PDP 2010: Parallel Distributed and network-based Processing, Pisa, Italy. February 2010.
Efficient streaming applications on multi-core with FastFlow: the biosequence alignment test-bed. Talk given at ParCo 2009, Lyon, France. September 2009.

Other Papers

Papers from people not directly involved in FastFlow design. The listing is not meant to be neither complete not always up to date.

Mehdi Goli, Michael T. Garba, Horacio González–Vélez. Streaming Dynamic Coarse-Grained CPU/GPU Workloads with Heterogeneous Pipelines in FastFlow, in: Proc. of the Intl. Conference on High Performance and Communications (HPCC), 2012.
Alexander Collins Christian Fensch Hugh Leather. Optimization Space Exploration of the FastFlow Parallel Skeleton Framework, in: HiPEAC'12/HLPGPU: High-level programming for heterogeneous and hierarchical parallel systems, Paris, France. January 2012.
Zalán Szűgyi, Norbert Pataki. Generative Version of the FastFlow Multicore Library, Electronic Notes in Theoretical Computer Science, Vol. 279, Issue 3, 27 Dec. 2011, pages 73–84 (Proc. of the 3rd Workshop on Generative Technologies), 2011.
Alexander James Collins. Automatically Optimising Parallel Skeletons, M.Sc. Thesis in Computer Science, School of Informatics University of Edinburgh, UK, 2011.

Related Sites and Projects

Related sites and Reviews

  • 1024cores: Lock-free algorithms: discussions and software reviews.
  • Hack the market: algorithmic threading experiences.

Related Projects

  • ParaPhrase: Parallel Patterns for Adaptive Heterogeneous Multicore Systems (EC-STREP FP7, started Oct 2011).
  • TERAFLUX: Exploiting Dataflow Parallelism in Teradevice Computing (EC-FET FP7, started Jan 2010).
  • HiPEAC-2: European Network of Excellence on High Performance and Embedded Architecture and Compilation (EC-NoE FP7, started 2010).
  • ParaPhrase: Parallel Patterns for Adaptive Heterogeneous Multicore Systems (STREP-FP7, starting Oct. 2011).

Acknowledgments to the open source community

At April 2011, after 1 year and half of life, the FastFlow website has been visited by about 20000 visitors; the project counts about 2400 downloads of the 1.0 stable build, and about 2000 downloads of the beta version from the sourceforge svn. Many of these people encouraged us to go on. We thanks all of them.



eXTReMe Tracker Join Multicore group at ResearchGATE

  • Bookmark at
  • Bookmark "FastFlow: programming multicores" at del.icio.us
  • Bookmark "FastFlow: programming multicores" at Digg
  • Bookmark "FastFlow: programming multicores" at Furl
  • Bookmark "FastFlow: programming multicores" at Reddit
  • Bookmark "FastFlow: programming multicores" at Ask
  • Bookmark "FastFlow: programming multicores" at BlinkList
  • Bookmark "FastFlow: programming multicores" at blogmarks
  • Bookmark "FastFlow: programming multicores" at Google
  • Bookmark "FastFlow: programming multicores" at Technorati
  • Bookmark "FastFlow: programming multicores" at Twitter
  • Bookmark "FastFlow: programming multicores" at Slashdot
 
ffnamespace/about.txt · Last modified: 2013/04/13 11:56 by torquati · [Old revisions]
Recent changes RSS feed Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki