User Tools

Site Tools


Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
ffnamespace:architecture [2014/08/14 13:36]
aldinuc [Core Patterns]
ffnamespace:architecture [2014/08/14 15:32]
aldinuc
Line 56: Line 56:
 === Nonblocking and Blocking behaviour === === Nonblocking and Blocking behaviour ===
  
-FastFlow run-time is designed to exhibit a nonblocking behaviour ​(by way of lock-free and wait-free algorithms, at least in the synchronisation critical paths). ​This design choice mainly targets efficiency for very fine grain parallelism.  ​+Blocking synchronisations fit well coarse grain parallelism (milliseconds tasks or more), whereas Nonblocking fine grain parallelism. Blocking synchronisations make it possible to exploit over-provisioning (e.g. for load balancing) and energy consumption. However, they exhibits large overheads (also due to OS involvement). Mixing blocking and nonblocking synchronisations is not trivial. 
 + 
 +FastFlow run-time is designed to exhibit a nonblocking behaviour, with the possibility to switch to blocking behaviour. Overall, a FastFlow run is a sequence of nonblocking running phases. Among two phases the run-time can switch to a blocking phase by way of a (original, data-flow) distributed protocol. In the FastFlow terminology,​ a pattern (or a composition of patterns) can //freeze// (i.e. suspend), to be later resumed in the next nonblocking phase. This model make it possible to address fine grain workloads, bursts of fine grain workloads, and coarse grain workloads. During nonblocking phase, the FastFlow run time employes only lock-free and wait-free algorithms in all synchronisation critical paths (whereas it uses pthreads locks in the blocking phase).
  
 === Deadlock avoidance === === Deadlock avoidance ===
 +
 +The implementation in term of streaming network (i.e. a network of threads or processes) of a pattern can be cyclic (e.g. master-worker,​ D&C, etc.). For this FastFlow uses its own unbound SPSC buffer to avoid deadlocks due to dependency cycles [ADK12].
  
 === Accelerator mode === === Accelerator mode ===
Line 138: Line 142:
 the x86 model). On other models (e.g., Itanium and Power4, 5, and 6), the x86 model). On other models (e.g., Itanium and Power4, 5, and 6),
 a store fence before an enqueue is needed [GMV08]. a store fence before an enqueue is needed [GMV08].
 +
 == GPGPUs == == GPGPUs ==
 +
 GPGPUs are supported by way of OpenCL and/or CUDA. At the current development status, kernel business code should be written either in OpenCL or CUDA. FastFlow takes care of H2D/D2H (asynchronous) data transfers and synchronisations. ''​stencil-reduce''​ pattern makes it possible to write most of the typical GPGPUs kernels as they were C/C++ code since intra-block and inter-blocks synchronisations (including reduce code) are transparently provided by the pattern. Still, the programmer can use OpenCL/CUDA directives in the kernel. ​ GPGPUs are supported by way of OpenCL and/or CUDA. At the current development status, kernel business code should be written either in OpenCL or CUDA. FastFlow takes care of H2D/D2H (asynchronous) data transfers and synchronisations. ''​stencil-reduce''​ pattern makes it possible to write most of the typical GPGPUs kernels as they were C/C++ code since intra-block and inter-blocks synchronisations (including reduce code) are transparently provided by the pattern. Still, the programmer can use OpenCL/CUDA directives in the kernel. ​
 +
 == Distributed == == Distributed ==
 +
 Distributed platforms build on top of TCP/IP and Infiniband/​OFED protocols are also supported. ​ Distributed platforms build on top of TCP/IP and Infiniband/​OFED protocols are also supported. ​
 FPGA support is planned but not yet fully developed. FPGA support is planned but not yet fully developed.
Line 171: Line 179:
  
 [AB+09] K. Asanovic, R. Bodik, J. Demmel, T. Keaveny, K. Keutzer, J. Kubiatowicz,​ N. Morgan, D. Patterson, K. Sen, J. Wawrzynek, D. Wessel, and K. Yelick. A view of the parallel computing landscape. Commun. ACM 52, 10 (Oct. 2009), 56-67. [[http://​doi.acm.org/​10.1145/​1562764.1562783|DOI]] [AB+09] K. Asanovic, R. Bodik, J. Demmel, T. Keaveny, K. Keutzer, J. Kubiatowicz,​ N. Morgan, D. Patterson, K. Sen, J. Wawrzynek, D. Wessel, and K. Yelick. A view of the parallel computing landscape. Commun. ACM 52, 10 (Oct. 2009), 56-67. [[http://​doi.acm.org/​10.1145/​1562764.1562783|DOI]]
- 
-[AMT09] M. Aldinucci, M. Meneghin, and M. Torquati. ​ Efficient Smith-Waterman on multi-core with fastflow. In Proc. of Intl. Euromicro PDP 2010: Parallel Distributed and network-based Processing, Pisa, Italy, Feb. 2010. IEEE. To appear. [[ffnamespace:​about|(Paper Draft)]] 
  
 [ADK11] M. Aldinucci, M. Danelutto, P. Kilpatrick, M. Meneghin, and M. Torquati. Accelerating code on multi- cores with fastflow. In Proc. of 17th Intl. Euro-Par 2011 Parallel Processing, volume 6853 of LNCS, pages 170–181, Bordeaux, France, Aug. 2011. Springer. [ADK11] M. Aldinucci, M. Danelutto, P. Kilpatrick, M. Meneghin, and M. Torquati. Accelerating code on multi- cores with fastflow. In Proc. of 17th Intl. Euro-Par 2011 Parallel Processing, volume 6853 of LNCS, pages 170–181, Bordeaux, France, Aug. 2011. Springer.
  
 [ADK12] M. Aldinucci, M. Danelutto, P. Kilpatrick, M. Meneghin, and M. Torquati. An efficient unbounded lock-free queue for multi-core systems. In Proc. of 18th Intl. Euro-Par 2012 Parallel Processing, volume 7484 of LNCS, pages 662–673, Rhodes Island, Greece, aug 2012. Springer. [ADK12] M. Aldinucci, M. Danelutto, P. Kilpatrick, M. Meneghin, and M. Torquati. An efficient unbounded lock-free queue for multi-core systems. In Proc. of 18th Intl. Euro-Par 2012 Parallel Processing, volume 7484 of LNCS, pages 662–673, Rhodes Island, Greece, aug 2012. Springer.
ffnamespace/architecture.txt · Last modified: 2014/09/12 19:06 by aldinuc