~~NOTOC~~ {{description>Fastflow is a C++ pattern-based parallel programming framework based upon lock-free shared-memory multithreading}} ===== FastFlow (FF) ===== FastFlow (快速流) is a C++ parallel programming framework advocating high-level, pattern-based parallel programming. It chiefly supports streaming and data parallelism, targeting heterogenous platforms composed of clusters of shared-memory platforms, possibly equipped with computing accelerators such as NVidia GPGPUs, Xeon Phi, Tilera TILE64. The main design philosophy of FastFlow is to provide application designers with key features for parallel programming (e.g. time-to-market, portability, efficiency and performance portability) via suitable parallel programming abstractions and a carefully designed run-time support. ==== Application scenarios ==== FastFlow is a general-purpose C++ programming framework for heterogenous parallel platforms. Like other high-level programming frameworks, such as Intel TBB and OpenMP, it simplifies the design and engineering of portable parallel applications. However, it has a clear edge in terms of expressiveness and performance with respect to other parallel programming frameworks in specific application scenarios, including, inter alia: * fine-grain parallelism on cache-coherent shared-memory platforms; * streaming applications; * coupled usage of multi-core and accelerators. In other cases FastFlow is typically comparable to (and is some cases slightly faster than) state-of-the-art parallel programming frameworks such as Intel TBB, OpenMP, Cilk, etc. More details may be found on the [[ffnamespace:performance|performance]] page. A number of both micro-benchmarks and real-world applications has been developed with FastFlow (or ported from other parallel libraries) in order to asses its usability and performance, among the others: * **Bowtie2**: fast and sensitive read alignment //(multicore; porting of the original version developed with pthreads and spinlocks)// * **Two-phase video/image restoration** for impulsive/Gaussian noise //(multicore, GPGPUs, heterogeneous; original algorithm)// * **Block-based Cholesky & LU decomposition** for dense matrices //(multicore, original algorithms)// * **Yadt C4.5 classifier** //(multicore; original algorithm)// * **CWC Gillespie simulator** for systems biology //(multicore, distributed, GPGPUs; original algorithm)// * SWPS3: vectorized **Smith-Waterman** local alignment software //(multicore; porting of the original version developed with pthreads)// * **pbzip2**: Parallel BZIP2 //(multicore; porting of the original version developed with POSIX processes)// * Fast networks **Deep Packet Inspection** //(multicore; original algorithm)// * Several standard algorithms, such as nqueens, fibonacci, QT-mandebrot, matrix block multiplication, ... //(multicore, GPGPUs, distributed, heterogeneous)// * over 100 micro-benchmarks testing individual features and patterns //(multicore, GPGPUs, distributed, heterogeneous)// The source code of (almost) all applications can be found in the [[https://sourceforge.net/p/mc-fastflow/code/|FastFlow SVN repository]] under either LGPLv3 or GNU GPL license. The design of applications and their performance are described in [[http://alpha.di.unito.it/fastflow-papers|research papers]]. ==== Design ==== FastFlow comes as a C++ template library designed as a [[ffnamespace:architecture|stack of layers]] that progressively abstracts out the programming of parallel applications. The goal of the stack is threefold: portability, extensibility, and performance. For this, all the three layers are realised as thin strata of C++ templates that are 1) seamlessly portable; 2) easily extended via subclassing; and 3) statically compiled and cross-optimised with the application. The terse design ensures easy portability on almost all OSes and CPUs with a C++ compiler. The main development platform is Linux/x86_64/gcc, but it has been tested also on various combinations of x86, x86_64, PPC, ARM, Tilera, NVidia with gcc, icc, Visual Studio on Linux, Mac OS, and Windows XP/7. The FastFlow core has been ported to ARM/iOS. The FastFlow [[ffnamespace:architecture|run-time support]] uses several techniques to efficiently support fine grain parallelism (and very high frequency streaming). Among these are: * non-blocking multi-threading with lock-less synchronisations; * zero-copy network messaging (via 0MQ/TCP and RDMA/Infiniband); * asynchronous data feeding for accelerator offloading. FastFlow has been adopted by a number of research projects and third-party development initiatives, and has thus been tested in a variety of application scenarios: from systems biology to high-frequency trading. ==== Big Pictures ==== ^FastFlow/C++11 in REPARA^FastFlow big picture (2014)^App: faster Bowtie2 (2013)^ |[[http://calvados.di.unipi.it/storage/paper_files/2015_Artemis_REPARA_FF_poster.pdf|{{:ffnamespace:2015_artemis_repara_ff.png?220|}}]]|[[http://calvados.di.unipi.it/storage/paper_files/2014_ff_poster_hipeac.pdf|{{:ffnamespace:2014_ff_poster_hipeac.png?220}}]]|[[http://calvados.di.unipi.it/storage/paper_files/2013_ff_botie2_mem_affinity_acaces.pdf|{{:ffnamespace:2013_ff_botie2_mem_affinity_acaces.png?220}}]]| |Artemis Co-Summit 2015| |HiPEAC-ACACES 2013| ^FastFlow & its applications (2014)^Lock-less programming with FastFlow (2012)^ |[[http://calvados.di.unipi.it/storage/paper_files/2014_ff_poster_openday_unito.pdf|{{:ffnamespace:2014_ff_poster_openday_unito.png?330|}}]]|[[http://calvados.di.unipi.it/storage/paper_files/2012_ACACES_poster.pdf|{{:ffnamespace:2012_acaces_poster.png?330}}]]| |UniTO Industrial day 2014|HiPEAC-ACACES 2012| ----