User Tools

Site Tools


Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
ffnamespace:faq [2013/05/07 18:34]
peter [How big are the SPSC queues?]
ffnamespace:faq [2014/09/15 01:00] (current)
aldinuc
Line 1: Line 1:
-~~NOTOC~~ 
 ====== Frequently Asked Questions ====== ====== Frequently Asked Questions ======
 +
 +Which platforms/​OSes/​compilers are supported?
 +
 +  * Linux (i386, x86_64, Arm, PPC) with gcc supporting c++11 (>4.6). Other c++11-enabled compilers (e.g. Intel ICC) typically works. ​    
 +  * MacOS X (> 10.4, i386-x86_64,​ PPC) with a c++ supporting c++11 (e.g. clang5.1, gcc).  ​
 +  * Usage of GPUs (NVidia, AMD) requires either CUDA or OpenCL.
 +  * Microsoft Windows (Windows 7 64 bit, x86_64) with Visual Studio Express 2013. Other Windows and Visual Studio compiler might works (minor fixes might be required). Window code is not fully optimised for performance.
 +  * Other platforms/​OSes/​compilers might work but are not extensively tested (e.g. iOS). FastFlow core is a header-only library and it is likely to work on any platform with a good c++ compiler. c++11 is required to use all FastFlow features. Core patterns does not requires c++11. Main development platform is Linux/​x86_64/​gcc.
 +  * Dependencies from third-party libraries: Shared-memory:​ pthreads OR native threads for Window; Distributed:​ zeromq/TCP and/or IB/OFED, GPU: CUDA or OpenCL.
 + 
 +<note important>​Work in progress</​note>​
 +
 ===== Programming effort ===== ===== Programming effort =====
 ==== FastFlow vs OpenMP and Intel TBB (and CnC) ==== ==== FastFlow vs OpenMP and Intel TBB (and CnC) ====
Line 15: Line 26:
 === FastFlow vs Intel CnC === === FastFlow vs Intel CnC ===
 To appear, we are working on it. To appear, we are working on it.
-===== Supported platforms and OSes ===== +
-Linux (i386, x86_64) and MacOS X (> 10.4, i386-x86_64,​ PPC) are directly supported. The support for Windows (32 and 64 bit) is available as beta in FastFlow 1.0.9 at revision 31 of Souceforge svn; it will be released with FastFlow 1.1. cc-NUMA platforms are supported (although optimization of the runtime for these platform is currently ongoing).+
  
 ===== Accelerators and offloading ===== ===== Accelerators and offloading =====
Line 33: Line 43:
 An empty SPSC queue on a 64bit platform has a size of 144 bytes. Queues are considered to store memory pointers, and so a queue of size ''​k''​ requires ''​144+64*k''​ bytes. Typically, a SPSC queue is just few KBytes. An empty SPSC queue on a 64bit platform has a size of 144 bytes. Queues are considered to store memory pointers, and so a queue of size ''​k''​ requires ''​144+64*k''​ bytes. Typically, a SPSC queue is just few KBytes.
 ====  How much memory may be consumed on a many-core system? ==== ====  How much memory may be consumed on a many-core system? ====
-Not that much since thread ​are typically not connected by a //​complete//​ graph, but according to skeleton synchronization schema that is not typically complete. As an example, the ''​pipeline''​ skeleton with n stages requires ''​n-1''​ queues, the ''​farm''​ skeleton requires ''​2+2*n_workers''​ queues, Divide&​Conquer (i.e. farm with feedback channels) requires ​ ''​2*n_workers''​ queues. Typically the consumed size linearly grows with number of threads. +Not very much since threads ​are typically not connected by a //​complete//​ graph, but according to the skeleton synchronization schema that is not typically complete. As an example, the ''​pipeline''​ skeleton with n stages requires ''​n-1''​ queues, the ''​farm''​ skeleton requires ''​2+2*n_workers''​ queues, Divide&​Conquer (i.e. farm with feedback channels) requires ​ ''​2*n_workers''​ queues. Typically the consumed size linearly grows with the number of threads. 
-==== How MPMC queues ​are realized? ==== +==== How are MPMC queues realized? ==== 
-Multiple-Producer-Multiple-Consumer (MPMC) queues are realized using one SPSC queue per producer and one SPSC queue per consumer. These queues are put together using an arbiter thread in a fully lock-free and fence-free fashion (no CAS at all). SPSC queues are enriched with additional methods ​aiming ​at improving cache locality and throughput, such as multi-push. In addition, FastFlow provides several variants of classic lock-free queues (using CAS operations) such as Michael&​Scott queue, which leverage on deferred reclamation and memory alignment provided by FastFlow allocators. ​+Multiple-Producer-Multiple-Consumer (MPMC) queues are realized using one SPSC queue per producer and one SPSC queue per consumer. These queues are put together using an arbiter thread in a fully lock-free and fence-free fashion (no CAS at all). SPSC queues are enriched with additional methods ​aimed at improving cache locality and throughput, such as multi-push. In addition, FastFlow provides several variants of classic lock-free queues (using CAS operations) such as the Michael&​Scott queue, which leverage on deferred reclamation and memory alignment provided by FastFlow allocators. ​
 ==== Is this approach scalable? ==== ==== Is this approach scalable? ====
-In general, the scalability of the approach ​depend by the quality of the mapping from skeleton implementation onto underlying memory connectivity. Skeletons requiring higher connectivity (i.e. more synchronizations) may requires ​a higher connectivity degree at the hardware memory data-path level. ​Observe ​however, this is true for any concurrent programming model. The big advance of skeletal ​approach indeed consists in the possibility to exploit different implementation templates for the same skeleton in order to match the peculiarity of different memory sub-systems. This enhance ​portability and performance portability since the code should ​not be re-designed for different multi-core platforms. ​  +In general, the scalability of the approach ​depends on the quality of the mapping from skeleton implementation onto underlying memory connectivity. Skeletons requiring higher connectivity (i.e. more synchronizations) may require ​a higher connectivity degree at the hardware memory data-path level. ​Note, however, ​that this is true for any concurrent programming model. The big advance of the skeleton ​approach indeed consists in the possibility to exploit different implementation templates for the same skeleton in order to match the peculiarity of different memory sub-systems. This enhances ​portability and performance portability since the code does not have to be re-designed for different multi-core platforms. ​  
-==== Is FastFlow ​supporting ​unbound/​dynamic queues? ==== +==== Does FastFlow ​support ​unbound/​dynamic queues? ==== 
-There exist an unbound version of SPSC FastFlow queue. This kind of queue can dynamically and automatically grow and shrink to match actual size needs. As other queues, the unbound queue (so-called //uSPSC//) is lock-free and fence-free and exhibits almost the same performance of other queues. uBuffer implementation is available within FastFlow tarballthe correctness proof is described {{http://​calvados.di.unipi.it/​dokuwiki/​lib/​tpl/​torquati/​paper_files/​TR-10-20.pdf|here}}. +There exists ​an unbound version of the SPSC FastFlow queue. This kind of queue can dynamically and automatically grow and shrink to match actual size requirements. As with other queues, the unbound queue (so-called //uSPSC//) is lock-free and fence-free and exhibits almost the same performance of other queues. uBuffer implementation is available within ​the FastFlow tarballthe correctness proof is described {{http://​calvados.di.unipi.it/​dokuwiki/​lib/​tpl/​torquati/​paper_files/​TR-10-20.pdf|here}}. 
-==== Why using both bound and unbound queues? ==== +==== Why use both bound and unbound queues? ==== 
-Bound and unbound queues target different problems. Bound queues can be used to exploit a limited degree of asynchrony among threads, ​thus are useful ​to enforce ​temporal synchronizations. Unbounded queues ​enforces ​data-dependency only (asynchrony degree is unbound), they are very useful in deadlock avoidance strategies of cyclic streaming networks, but does not induce temporal synchronicity among threads. A good system should find a fair trade-off between the two kind of queues ​as well as properly ​defines ​the size of bound queues. As an example, a queues ​with length 1 can be used to model a temporal synchronization device since the producer can check when the consumer has received the data. +Bound and unbound queues target different problems. Bound queues can be used to exploit a limited degree of asynchrony among threads, ​and so are useful ​for enforcing ​temporal synchronizations. Unbounded queues ​enforce ​data-dependency only (asynchrony degree is unbound), they are very useful in deadlock avoidance strategies of cyclic streaming networks, but do not induce temporal synchronicity among threads. A good system should find a fair trade-off between the two kinds of queue as well as properly ​defining ​the size of bound queues. As an example, a queue with length 1 can be used to model a temporal synchronization device since the producer can check when the consumer has received the data. 
-==== Do FastFlow queues represent a novel research ​results? ====  +==== Do FastFlow queues represent a novel research ​result? ====  
-Bound SPSC queues are inspired ​to //P1C1// queues by Higham and Kavalsh (1997), ​despite ​the implementation ​differ from many important details. FastFlow MPMC queues are, to the best of our knowledge, an original ​usage of SPSC queues. FastFlow unbound SPSC queues ​idea and design, to the best of our knowledge, is fully novel. Unbound queues can be combined exactly as other SPSC queues to compose MPSC unbound queues (and this is again a novel result).  ​+Bound SPSC queues are inspired ​by //P1C1// queues by Higham and Kavalsh (1997), ​although ​the implementation ​differs in many important details. FastFlow MPMC queues are, to the best of our knowledge, an original ​use of SPSC queues. ​The FastFlow unbound SPSC queue idea and design, to the best of our knowledge, is fully novel. Unbound queues can be combined exactly as other SPSC queues to compose MPSC unbound queues (and this is again a novel result).  ​
ffnamespace/faq.1367944442.txt.gz · Last modified: 2013/05/07 18:34 by peter