Patterns for "in-place" (easy) parallelisation of sequential code. More...

Collaboration diagram for High-level patterns:

Files
file	farm.hpp
	Farm pattern.

file	map.hpp
	map pattern

file	mdf.hpp
	This file implements the macro dataflow pattern.

file	parallel_for.hpp
	This file describes the parallel_for/parallel_reduce skeletons.

file	pipeline.hpp
	This file implements the pipeline skeleton, both in the high-level pattern syntax (ff::ff_pipe) and low-level syntax (ff::ff_pipeline)

file	poolEvolution.hpp
	The PoolEvolution pattern models the evolution of a given population.

Classes
class	ff::ff_farm< lb_t, gt_t >
	The Farm skeleton, with Emitter (`lb_t`) and Collector (`gt_t`). More...

class	Map
	Map pattern. More...

class	ff::ff_mdf
	Macro Data Flow executor. More...

class	ff::ParallelFor
	Parallel for loop. Run automatically. More...

class	ff::ParallelForReduce< T >
	Parallel for and reduce. Run automatically. More...

class	ff::ff_pipe< TaskType >
	Pipeline pattern (high-level pattern syntax) More...

class	ff::poolEvolution< T, env_t >
	The pool evolution parallel pattern. More...

Functions
template<typename T >
	ff::ff_farm< lb_t, gt_t >::ff_farm (const std::function< T (T , ff_node *const)> &F, int nw, bool input_ch=false)
	High-level pattern constructor.

Detailed Description

Patterns for "in-place" (easy) parallelisation of sequential code.

They are clearly characterised in a specific usage context and are targeted to the parallelisation of sequential (legacy) code. Examples are exploitation of loop parallelism, stream parallelism, data-parallel algorithms, execution of general workflows of tasks, etc. The are typically equipped with self-optimisation capabilities (e.g. load-balancing, grain auto-tuning, parallelism-degree auto-tuning) and exhibit no or limited nesting capability. Examples are: parallel-for, pipeline, stencil-reduce, mdf (macro-data-flow). Some of them targets specific devices (e.g. GPGPUs). They are implemented on top of core patterns.

Class Documentation

class ff::ff_farm

template<typename lb_t = ff_loadbalancer, typename gt_t = ff_gatherer>
class ff::ff_farm< lb_t, gt_t >

The Farm skeleton, with Emitter (lb_t) and Collector (gt_t).

The Farm skeleton can be seen as a 3-stages pipeline. The first stage is the Emitter (lb_t) that act as a load-balancer; the last (optional) stage would be the Collector (gt_t) that gathers the results computed by the Workers, which are ff_nodes.

This class is defined in farm.hpp

Examples:: perf_test_alloc1.cpp, perf_test_alloc2.cpp, and pipe_basic.cpp.

Inheritance diagram for ff::ff_farm< lb_t, gt_t >:

Collaboration diagram for ff::ff_farm< lb_t, gt_t >:

Public Member Functions
template<typename T >
	ff_farm (const std::function< T (T , ff_node *const)> &F, int nw, bool input_ch=false)
	High-level pattern constructor.

	ff_farm (std::vector< ff_node * > &W, ff_node const Emitter=NULL, ff_node const Collector=NULL, bool input_ch=false)
	Core patterns constructor 2. More...

	ff_farm (bool input_ch=false, int in_buffer_entries=DEF_IN_BUFF_ENTRIES, int out_buffer_entries=DEF_OUT_BUFF_ENTRIES, bool worker_cleanup=false, int max_num_workers=DEF_MAX_NUM_WORKERS, bool fixedsize=false)
	Core patterns constructor 1. More...

	~ff_farm ()
	Destructor. More...

int	add_emitter (ff_node *e)
	Adds the emitter. More...

void	set_scheduling_ondemand (const int inbufferentries=1)
	Set scheduling with on demand polity. More...

int	add_workers (std::vector< ff_node * > &w)
	Adds workers to the form. More...

int	add_collector (ff_node *c, bool outpresent=false)
	Adds the collector. More...

int	wrap_around (bool multi_input=false)
	Sets the feedback channel from the collector to the emitter. More...

int	remove_collector ()
	Removes the collector. More...

int	run (bool skip_init=false)
	Execute the Farm. More...

virtual int	run_and_wait_end ()
	Executs the farm and wait for workers to complete. More...

virtual int	run_then_freeze (ssize_t nw=-1)
	Executes the farm and then freeze. More...

int	wait ()
	Puts the thread in waiting state. More...

int	wait_freezing ()
	Waits for freezing. More...

bool	offload (void *task, unsigned int retry=((unsigned int)-1), unsigned int ticks=ff_loadbalancer::TICKS2WAIT)

bool	load_result (void **task, unsigned int retry=((unsigned int)-1), unsigned int ticks=ff_gatherer::TICKS2WAIT)
	Loads results into gatherer. More...

bool	load_result_nb (void **task)
	Loads result with non-blocking. More...

double	ffTime ()
	Misure ff::ff_node execution time. More...

Public Member Functions inherited from ff::ff_node
virtual bool	put (void *ptr)
	Nonblocking put onto output channel. More...

virtual bool	get (void **ptr)
	Noblocking pop from input channel. More...

virtual FFBUFFER *	get_in_buffer () const
	Gets input channel. More...

virtual FFBUFFER *	get_out_buffer () const
	Gets pointer to the output channel. More...

virtual bool	ff_send_out (void *task, unsigned int retry=((unsigned int)-1), unsigned int ticks=(TICKS2WAIT))
	Sends out the task. More...

Protected Member Functions
void	skipfirstpop (bool sk)
	Set the ff_node to start with no input task. More...

void *	svc (void *task)
	svc method

int	svc_init ()
	The svc_init method.

void	svc_end ()
	The svc_end method.

int	create_input_buffer (int nentries, bool fixedsize)
	Creates the input buffer for the emitter node. More...

int	create_output_buffer (int nentries, bool fixedsize=false)
	Creates the output channel. More...

int	set_output_buffer (FFBUFFER *const o)
	Sets the output buffer of the collector. More...

ff_node *	getEmitter ()
	Gets Emitter. More...

ff_node *	getCollector ()
	Gets Collector. More...

Protected Member Functions inherited from ff::ff_node
bool	skipfirstpop () const
	Gets the status of spontaneous start. More...

virtual int	set_input_buffer (FFBUFFER *const i)
	Assign the input channelname to a channel. More...

virtual double	wffTime ()
	Misure ff_node::svc execution time. More...

virtual	~ff_node ()
	Destructor.

Constructor & Destructor Documentation

template<typename lb_t = ff_loadbalancer, typename gt_t = ff_gatherer>

ff::ff_farm< lb_t, gt_t >::~ff_farm ( )

inline

Destructor.

Destruct the load balancer, the gatherer, all the workers

Member Function Documentation

template<typename lb_t = ff_loadbalancer, typename gt_t = ff_gatherer>

int ff::ff_farm< lb_t, gt_t >::add_collector	(	ff_node *	c,
		bool	outpresent = `false`
	)

inline

Adds the collector.

It adds the Collector filter to the farm skeleton. If no object is passed as a colelctor, than a default collector will be added (i.e. ff_gatherer). Note that it is not possible to add more than one collector.

Parameters

c	Collector object
outpresent	outstream?

Returns: The status of set_filter(x) if successful, otherwise -1 is returned.

template<typename lb_t = ff_loadbalancer, typename gt_t = ff_gatherer>

int ff::ff_farm< lb_t, gt_t >::add_emitter ( ff_node * e )

inline

Adds the emitter.

It adds an Emitter to the Farm. The Emitter is of type ff_node and there can be only one Emitter in a Farm skeleton.

Parameters

e	the `ff_node` acting as an Emitter

Returns: Returns 0 if successful -1 otherwise

template<typename lb_t = ff_loadbalancer, typename gt_t = ff_gatherer>

int ff::ff_farm< lb_t, gt_t >::add_workers ( std::vector< ff_node * > & w )

inline

Adds workers to the form.

Add workers to the Farm. There is a limit to the number of workers that can be added to a Farm. This limit is set by default to 64. This limit can be augmented by passing the desired limit as the fifth parameter of the ff_farm constructor.

Parameters

w	a vector of `ff_nodes` which are Workers to be attached to the Farm.

Returns: 0 if successsful, otherwise -1 is returned.

template<typename lb_t = ff_loadbalancer, typename gt_t = ff_gatherer>

int ff::ff_farm< lb_t, gt_t >::create_input_buffer	(	int	nentries,
		bool	fixedsize
	)

inlineprotectedvirtual

Creates the input buffer for the emitter node.

This function redefines the ff_node's virtual method of the same name. It creates an input buffer for the Emitter node.

Parameters

nentries	the size of the buffer
fixedsize	flag to decide whether the buffer is resizable.

Returns: If successful 0, otherwsie a negative value.

Reimplemented from ff::ff_node.

template<typename lb_t = ff_loadbalancer, typename gt_t = ff_gatherer>

int ff::ff_farm< lb_t, gt_t >::create_output_buffer	(	int	nentries,
		bool	fixedsize = `false`
	)

inlineprotectedvirtual

Creates the output channel.

Parameters

nentries	the number of elements of the buffer
fixedsize	flag to decide whether the buffer is bound or unbound. Default is `true`.

Returns: 0 if successful, -1 otherwise

Reimplemented from ff::ff_node.

template<typename lb_t = ff_loadbalancer, typename gt_t = ff_gatherer>

double ff::ff_farm< lb_t, gt_t >::ffTime ( )

inlinevirtual

Misure ff::ff_node execution time.

Returns: time (ms)

Reimplemented from ff::ff_node.

template<typename lb_t = ff_loadbalancer, typename gt_t = ff_gatherer>

ff_node* ff::ff_farm< lb_t, gt_t >::getCollector ( )

inlineprotected

Gets Collector.

It returns a pointer to the collector.

Returns: A pointer to collector node if exists, otherwise a NULL

template<typename lb_t = ff_loadbalancer, typename gt_t = ff_gatherer>

ff_node* ff::ff_farm< lb_t, gt_t >::getEmitter ( )

inlineprotected

Gets Emitter.

It returns a pointer to the emitter.

Returns: A pointer of the FastFlow node which is actually the emitter.

template<typename lb_t = ff_loadbalancer, typename gt_t = ff_gatherer>

bool ff::ff_farm< lb_t, gt_t >::load_result	(	void **	task,
		unsigned int	retry = `((unsigned int)-1)`,
		unsigned int	ticks = `ff_gatherer::TICKS2WAIT`
	)

inline

Loads results into gatherer.

It loads the results from the gatherer (if any).

Parameters

task	is a void pointer
retry	is the number of tries to load the results
ticks	is the number of ticks to wait

Returns: false if EOS arrived or too many retries, true if there is a new value

template<typename lb_t = ff_loadbalancer, typename gt_t = ff_gatherer>

bool ff::ff_farm< lb_t, gt_t >::load_result_nb ( void ** task )

inline

Loads result with non-blocking.

It loads the result with non-blocking situation.

Parameters

task	is a void pointer

Returns: if no task is present, otherwise if there is a new value. It should be checked if the task has a FF_EOS

template<typename lb_t = ff_loadbalancer, typename gt_t = ff_gatherer>

bool ff::ff_farm< lb_t, gt_t >::offload	(	void *	task,
		unsigned int	retry = `((unsigned int)-1)`,
		unsigned int	ticks = `ff_loadbalancer::TICKS2WAIT`
	)

inline

Offloads teh task to farm

It offloads the given task to the farm.

Parameters

task	is a void pointer
retry	showing the number of tries to offload
ticks	is the number of ticks to wait

Returns: true if successful, otherwise false

template<typename lb_t = ff_loadbalancer, typename gt_t = ff_gatherer>

int ff::ff_farm< lb_t, gt_t >::remove_collector ( )

inline

Removes the collector.

It allows not to start the collector thread, whereas all worker's output buffer will be created as if it were present.

Returns: 0 is always returned.

template<typename lb_t = ff_loadbalancer, typename gt_t = ff_gatherer>

int ff::ff_farm< lb_t, gt_t >::run ( bool skip_init = false )

inlinevirtual

Execute the Farm.

It executes the form.

Parameters

skip_init A booleon value showing if the initialization should be skipped

Returns: If successful 0, otherwise a negative is returned.

Reimplemented from ff::ff_node.

template<typename lb_t = ff_loadbalancer, typename gt_t = ff_gatherer>

virtual int ff::ff_farm< lb_t, gt_t >::run_and_wait_end ( )

inlinevirtual

Executs the farm and wait for workers to complete.

It executes the farm and waits for all workers to complete their tasks.

Returns: If successful 0, otherwise a negative value is returned.

template<typename lb_t = ff_loadbalancer, typename gt_t = ff_gatherer>

virtual int ff::ff_farm< lb_t, gt_t >::run_then_freeze ( ssize_t nw = -1 )

inlinevirtual

Executes the farm and then freeze.

It executs the form and then freezes the form. If workers are frozen, it is possible to wake up just a subset of them.

Returns: If successful 0, otherwise a negative value

template<typename lb_t = ff_loadbalancer, typename gt_t = ff_gatherer>

int ff::ff_farm< lb_t, gt_t >::set_output_buffer ( FFBUFFER *const o )

inlineprotectedvirtual

Sets the output buffer of the collector.

This function redefines the ff_node's virtual method of the same name. Set the output buffer for the Collector.

Parameters

o	a buffer object, which can be of type `SWSR_Ptr_Buffer` or `uSWSR_Ptr_Buffer`

Returns: 0 if successful, otherwise -1 is returned.

Reimplemented from ff::ff_node.

template<typename lb_t = ff_loadbalancer, typename gt_t = ff_gatherer>

void ff::ff_farm< lb_t, gt_t >::set_scheduling_ondemand ( const int inbufferentries = 1 )

inline

Set scheduling with on demand polity.

The default scheduling policy is round-robin, When there is a great computational difference among tasks the round-robin scheduling policy could lead to load imbalance in worker's workload (expecially with short stream length). The on-demand scheduling policy can guarantee a near optimal load balancing in lots of cases. Alternatively it is always possible to define a complete application-level scheduling by redefining the ff_loadbalancer class.

Parameters

inbufferentries sets the number of queue slot for one worker threads.

template<typename lb_t = ff_loadbalancer, typename gt_t = ff_gatherer>

void ff::ff_farm< lb_t, gt_t >::skipfirstpop ( bool sk )

inlineprotectedvirtual

Set the ff_node to start with no input task.

Setting it to true let the ff_node execute the svc method spontaneusly before receiving a task on the input channel. skipfirstpop makes it possible to define a "producer" node that starts the network.

Parameters

sk	`true` start spontaneously (*task will be NULL)

Reimplemented from ff::ff_node.

template<typename lb_t = ff_loadbalancer, typename gt_t = ff_gatherer>

int ff::ff_farm< lb_t, gt_t >::wait ( )

inlinevirtual

Puts the thread in waiting state.

It puts the thread in waiting state.

Returns: 0 if successful, otherwise -1 is returned.

Reimplemented from ff::ff_node.

template<typename lb_t = ff_loadbalancer, typename gt_t = ff_gatherer>

int ff::ff_farm< lb_t, gt_t >::wait_freezing ( )

inlinevirtual

Waits for freezing.

It waits for thread to freeze.

Returns: 0 if successful otherwise -1 is returned.

Reimplemented from ff::ff_node.

template<typename lb_t = ff_loadbalancer, typename gt_t = ff_gatherer>

int ff::ff_farm< lb_t, gt_t >::wrap_around ( bool multi_input = false )

inline

Sets the feedback channel from the collector to the emitter.

This method allows to estabilish a feedback channel from the Collector to the Emitter. If the collector is present, than the collector output queue will be connected to the emitter input queue (feedback channel)

Returns: 0 if successful, otherwise -1 is returned.

class Map

Map pattern.

Apply to all

Todo:: Map to be documented and exemplified

class ff::ff_mdf

Macro Data Flow executor.

Inheritance diagram for ff::ff_mdf:

Collaboration diagram for ff::ff_mdf:

Public Member Functions
template<typename T1 >
	ff_mdf (void(F)(T1 const), T1 const args, size_t outstandingTasks=DEFAULT_OUTSTANDING_TASKS, int maxnw=ff_realNumCores(), void(schedRelaxF)(unsigned long)=NULL)
	Constructor. More...

void *	svc (void *)
	The service callback (should be filled by user with parallel activity business code) More...

double	ffTime ()
	Misure ff::ff_node execution time. More...

Public Member Functions inherited from ff::ff_node
virtual int	svc_init ()
	Service initialisation. More...

virtual void	svc_end ()
	Service finalisation. More...

virtual bool	put (void *ptr)
	Nonblocking put onto output channel. More...

virtual bool	get (void **ptr)
	Noblocking pop from input channel. More...

virtual FFBUFFER *	get_in_buffer () const
	Gets input channel. More...

virtual FFBUFFER *	get_out_buffer () const
	Gets pointer to the output channel. More...

virtual bool	ff_send_out (void *task, unsigned int retry=((unsigned int)-1), unsigned int ticks=(TICKS2WAIT))
	Sends out the task. More...

Additional Inherited Members
Protected Member Functions inherited from ff::ff_node
virtual void	skipfirstpop (bool sk)
	Set the ff_node to start with no input task. More...

bool	skipfirstpop () const
	Gets the status of spontaneous start. More...

virtual int	create_input_buffer (int nentries, bool fixedsize=true)
	Creates the input channel. More...

virtual int	create_output_buffer (int nentries, bool fixedsize=false)
	Creates the output channel. More...

virtual int	set_output_buffer (FFBUFFER *const o)
	Assign the output channelname to a channel. More...

virtual int	set_input_buffer (FFBUFFER *const i)
	Assign the input channelname to a channel. More...

virtual int	run (bool=false)
	Run the ff_node. More...

virtual int	wait ()
	Wait ff_node termination. More...

virtual int	wait_freezing ()
	Wait the freezing state. More...

virtual double	wffTime ()
	Misure ff_node::svc execution time. More...

virtual	~ff_node ()
	Destructor.

Constructor & Destructor Documentation

template<typename T1 >

ff::ff_mdf::ff_mdf	(	void()(T1 const)	F,
		T1 *const	args,
		size_t	outstandingTasks = `DEFAULT_OUTSTANDING_TASKS`,
		int	maxnw = `ff_realNumCores()`,
		void(*)(unsigned long)	schedRelaxF = `NULL`
	)

inline

Constructor.

Parameters

F	= is the user's function
args	= is the argument of the function F
maxnw	= is the maximum number of farm's workers that can be used
schedRelaxF	= is a function for managing busy-waiting in the farm scheduler

References ff::ff_farm< lb_t, gt_t >::add_emitter(), ff::ff_pipeline::add_stage(), ff::ff_farm< lb_t, gt_t >::add_workers(), ff::ff_pipeline::run_then_freeze(), ff::ff_pipeline::wait_freezing(), and ff::ff_farm< lb_t, gt_t >::wrap_around().

Member Function Documentation

double ff::ff_mdf::ffTime ( )

inlinevirtual

Misure ff::ff_node execution time.

Returns: time (ms)

Reimplemented from ff::ff_node.

References ff::ff_pipeline::ffTime().

void* ff::ff_mdf::svc ( void * task )

inlinevirtual

The service callback (should be filled by user with parallel activity business code)

Parameters

task	is a the input data stream item pointer (task)

Returns: output data stream item pointer

Implements ff::ff_node.

class ff::ParallelFor

Parallel for loop. Run automatically.

Identifies an iterative work-sharing construct that specifies a region (i.e. a Lambda function) in which the iterations of the associated loop should be executed in parallel.

Examples:: parfor_basic.cpp.

Public Member Functions
	ParallelFor (const long maxnw=FF_AUTO, bool spinwait=false)
	Constructor. More...

	~ParallelFor ()
	Destructor. More...

void	disableScheduler (bool onoff=true)
	Disable active scheduler (i.e. Emitter thread) More...

template<typename Function >
void	parallel_for (long first, long last, const Function &f, const long nw=FF_AUTO)
	Parallel for region (basic) - static. More...

template<typename Function >
void	parallel_for (long first, long last, long step, const Function &f, const long nw=FF_AUTO)
	Parallel for region (step) - static. More...

template<typename Function >
void	parallel_for (long first, long last, long step, long grain, const Function &f, const long nw=FF_AUTO)
	Parallel for region (step, grain) - dynamic. More...

template<typename Function >
void	parallel_for_thid (long first, long last, long step, long grain, const Function &f, const long nw=FF_AUTO)
	Parallel for region with threadID (step, grain, thid) - dynamic. More...

template<typename Function >
void	parallel_for_idx (long first, long last, long step, long grain, const Function &f, const long nw=FF_AUTO)
	Parallel for region with indexes ranges (step, grain, thid, idx) - dynamic - advanced usage. More...

template<typename Function >
void	parallel_for_static (long first, long last, long step, long grain, const Function &f, const long nw=FF_AUTO)
	Parallel for region (step, grain) - static. More...

Constructor & Destructor Documentation

ff::ParallelFor::ParallelFor	(	const long	maxnw = `FF_AUTO`,
		bool	spinwait = `false`
	)

inline

Constructor.

Set up a parallel for ParallelFor pattern run-time support (i.e. spawn workers threads) A single object can be used as many times as needed to run different parallel for pattern instances (different loop bodies). They cannot be nested nor recursive. Nonblocking policy is to be preferred in case of repeated call of the some of the parallel_for methods (e.g. within a strict outer loop). On the same ParallelFor object different parallel_for methods (e.g. parallel_for and parallel_for_thid, parallel_for_idx) can be called in sequence.

Parameters

maxnw	Maximum number of worker threads (not including active scheduler, if any). Deafault FF_AUTO = N. of HW contexts.
spinwait	barrier kind. `true` nonblocking, `false` blocking. Nonbloking barrier will leave worker threads active until class destruction is called (the threads will be active and in the nonblocking barrier only after the first call to one of the parallel_for methods). To put threads to sleep between different calls, the threadPause method may be called.

ff::ParallelFor::~ParallelFor ( )

inline

Destructor.

Terminate ParallelFor run-time support and makes resources housekeeping. Both nonlocking and blocking worker threads are terminated.

Member Function Documentation

void ff::ParallelFor::disableScheduler ( bool onoff = true )

inline

Disable active scheduler (i.e. Emitter thread)

Disable active scheduler (i.e. Emitter thread of the master-worker implementation). Active scheduling uses one dedicated nonblocking thread. In passive scheduling, workers cooperatively schedule tasks via synchronisations in memory. None of the above is always faster than the other: it depends on parallelism degree, task grain and platform. As rule of thumb on large multicore and fine-grain tasks active scheduling is faster. On few cores passive scheduler enhances overall performance. Active scheduler is the default option.

Parameters

onoff true disable active schduling, false enable active scheduling

template<typename Function >

void ff::ParallelFor::parallel_for	(	long	first,
		long	last,
		const Function &	f,
		const long	nw = `FF_AUTO`
	)

inline

Parallel for region (basic) - static.

Static scheduling onto nw worker threads. Data is statically partitioned in blocks, i.e. partition size = last-first/(nw*step).

Parameters

first	first value of the iteration variable
last	last value of the iteration variable
f	f(const long idx) Lambda function, body of the parallel loop. idx: iterator
nw	number of worker threads (default FF_AUTO)

template<typename Function >

void ff::ParallelFor::parallel_for	(	long	first,
		long	last,
		long	step,
		const Function &	f,
		const long	nw = `FF_AUTO`
	)

inline

Parallel for region (step) - static.

Static scheduling onto nw worker threads. Iteration space is walked with stride step. Data is statically partitioned in blocks, i.e. partition size = last-first/(nw*step).

Parameters

first	first value of the iteration variable
last	last value of the iteration variable
step	step increment for the iteration variable
f	f(const long idx) body of the parallel loop
nw	number of worker threads

template<typename Function >

void ff::ParallelFor::parallel_for	(	long	first,
		long	last,
		long	step,
		long	grain,
		const Function &	f,
		const long	nw = `FF_AUTO`
	)

inline

Parallel for region (step, grain) - dynamic.

Dynamic scheduling onto nw worker threads. Iterations are scheduled in blocks of minimal size grain. Iteration space is walked with stride step.

Parameters

first	first value of the iteration variable
last	last value of the iteration variable
step	step increment for the iteration variable
grain	(> 0) minimum computation grain (n. of iterations scheduled together to a single worker)
f	f(const long idx) Lambda function, body of the parallel loop. idx: iteration param nw number of worker threads

template<typename Function >

void ff::ParallelFor::parallel_for_idx	(	long	first,
		long	last,
		long	step,
		long	grain,
		const Function &	f,
		const long	nw = `FF_AUTO`
	)

inline

Parallel for region with indexes ranges (step, grain, thid, idx) - dynamic - advanced usage.

Dynamic scheduling onto nw worker threads. Iterations are scheduled in blocks of minimal size grain. Iteration space is walked with stride step. A chunk of grain iterations are assigned to each worker but they are not automatically walked. Each chunk can be traversed within the parallel_for body (e.g. with a for loop within f with the same step).

Note: Useful in few cases only - requires some expertise.

Parameters

first	first value of the iteration variable
last	last value of the iteration variable
step	step increment for the iteration variable
grain	(> 0) minimum computation grain (n. of iterations scheduled together to a single worker)
f	f(const long start_idx, const long stop_idx, const int thid) Lambda function, body of the parallel loop. start_idx and stop_idx: iteration bounds assigned to worker_id thid.
nw	number of worker threads (default n. of platform HW contexts)

template<typename Function >

void ff::ParallelFor::parallel_for_static	(	long	first,
		long	last,
		long	step,
		long	grain,
		const Function &	f,
		const long	nw = `FF_AUTO`
	)

inline

Parallel for region (step, grain) - static.

Static scheduling onto nw worker threads. Iterations are scheduled in blocks of minimal size grain > 1 or in maximal partitions grain == 0. Iteration space is walked with stride step.

Parameters

first	first value of the iteration variable
last	last value of the iteration variable
step	step increment for the iteration variable
grain	(> 0) minimum computation grain (n. of iterations scheduled together to a single worker)
f	f(const long idx) Lambda function, body of the parallel loop. start_idx and stop_idx: iteration bounds assigned to worker_id thid.
nw	number of worker threads (default n. of platform HW contexts)

template<typename Function >

void ff::ParallelFor::parallel_for_thid	(	long	first,
		long	last,
		long	step,
		long	grain,
		const Function &	f,
		const long	nw = `FF_AUTO`
	)

inline

Parallel for region with threadID (step, grain, thid) - dynamic.

Dynamic scheduling onto nw worker threads. Iterations are scheduled in blocks of minimal size grain. Iteration space is walked with stride step. thid Worker thread ID is made available via a Lambda parameter.

Parameters

first	first value of the iteration variable
last	last value of the iteration variable
step	step increment for the iteration variable
grain	minimum computation grain (n. of iterations scheduled together to a single worker)
f	f(const long idx, const int thid) Lambda function, body of the parallel loop. idx: iteration, thid: worker_id
nw	number of worker threads (default n. of platform HW contexts)

class ff::ParallelForReduce

template<typename T>
class ff::ParallelForReduce< T >

Parallel for and reduce. Run automatically.

Set up the run-time for parallel for and parallel reduce.

Parallel for: Identifies an iterative work-sharing construct that specifies a region (i.e. a Lambda function) in which the iterations of the associated loop should be executed in parallel. in parallel.

Parallel reduce: reduce an array of T to a single value by way of an associative operation.

Template Parameters

T	reduction op type: op(T,T) -> T

Examples:: funcmin.cpp.

Inherited by ff::ff_Map< T >, and ff::ff_Map<>.

Public Member Functions
	ParallelForReduce (const long maxnw=FF_AUTO, bool spinwait=false)
	Constructor. More...

template<typename Function >
void	parallel_for (long first, long last, const Function &f, const long nw=FF_AUTO)
	Parallel for region (basic) - static. More...

template<typename Function >
void	parallel_for (long first, long last, long step, const Function &f, const long nw=FF_AUTO)
	Parallel for region (step) - static. More...

template<typename Function >
void	parallel_for (long first, long last, long step, long grain, const Function &f, const long nw=FF_AUTO)
	Parallel for region (step, grain) - dynamic. More...

template<typename Function >
void	parallel_for_thid (long first, long last, long step, long grain, const Function &f, const long nw=FF_AUTO)
	Parallel for region with threadID (step, grain, thid) - dynamic. More...

template<typename Function >
void	parallel_for_idx (long first, long last, long step, long grain, const Function &f, const long nw=FF_AUTO)
	Parallel for region with indexes ranges (step, grain, thid, idx) - dynamic - advanced usage. More...

template<typename Function >
void	parallel_for_static (long first, long last, long step, long grain, const Function &f, const long nw=FF_AUTO)
	Parallel for region (step) - static. More...

template<typename Function , typename FReduction >
void	parallel_reduce (T &var, const T &identity, long first, long last, const Function &partialreduce_body, const FReduction &finalreduce_body, const long nw=FF_AUTO)
	Parallel reduce (basic) More...

template<typename Function , typename FReduction >
void	parallel_reduce (T &var, const T &identity, long first, long last, long step, const Function &body, const FReduction &finalreduce, const long nw=FF_AUTO)
	Parallel reduce (step) More...

template<typename Function , typename FReduction >
void	parallel_reduce (T &var, const T &identity, long first, long last, long step, long grain, const Function &body, const FReduction &finalreduce, const long nw=FF_AUTO)
	Parallel reduce (step, grain) More...

template<typename Function , typename FReduction >
void	parallel_reduce_static (T &var, const T &identity, long first, long last, long step, long grain, const Function &body, const FReduction &finalreduce, const long nw=FF_AUTO)
	Parallel reduce region (step) - static. More...

Protected Member Functions
	ParallelForReduce (const long maxnw, bool spinWait, bool skipWarmup)
	Constructor. More...

Constructor & Destructor Documentation

template<typename T>

ff::ParallelForReduce< T >::ParallelForReduce	(	const long	maxnw,
		bool	spinWait,
		bool	skipWarmup
	)

inlineprotected

Constructor.

Parameters

maxnw	Maximum number of worker threads
spinWait	`true` Noblocking support (run-time thread will never suspend, even between successive calls to `parallel_for` and `parallel_reduce`, useful when they are called in sequence on small kernels), `false` blocking support
skipWarmup	Skip warmup phase (autotuning)

template<typename T>

ff::ParallelForReduce< T >::ParallelForReduce	(	const long	maxnw = `FF_AUTO`,
		bool	spinwait = `false`
	)

inline

Constructor.

Parameters

maxnw	Maximum number of worker threads
spinwait	`true` for noblocking support (run-time thread will never suspend, even between successive calls to `parallel_for` and `parallel_reduce`, useful when they are called in sequence on small kernels), `false` blocking support

Member Function Documentation

template<typename T>

template<typename Function >

void ff::ParallelForReduce< T >::parallel_for	(	long	first,
		long	last,
		const Function &	f,
		const long	nw = `FF_AUTO`
	)

inline

Parallel for region (basic) - static.

Static scheduling onto nw worker threads. Data is statically partitioned in blocks, i.e. partition size = last-first/(nw*step).

Parameters

first	first value of the iteration variable
last	last value of the iteration variable
f	f(const long idx) Lambda function, body of the parallel loop. idx: iterator
nw	number of worker threads (default FF_AUTO)

template<typename T>

template<typename Function >

void ff::ParallelForReduce< T >::parallel_for	(	long	first,
		long	last,
		long	step,
		const Function &	f,
		const long	nw = `FF_AUTO`
	)

inline

Parallel for region (step) - static.

Static scheduling onto nw worker threads. Data is statically partitioned in blocks, i.e. partition size = last-first/(nw*step).

Parameters

first	first value of the iteration variable
last	last value of the iteration variable
step	step increment for the iteration variable
f	f(const long idx) body of the parallel loop
nw	number of worker threads

template<typename T>

template<typename Function >

void ff::ParallelForReduce< T >::parallel_for	(	long	first,
		long	last,
		long	step,
		long	grain,
		const Function &	f,
		const long	nw = `FF_AUTO`
	)

inline

Parallel for region (step, grain) - dynamic.

Dynamic scheduling onto nw worker threads. Iterations are scheduled in blocks of minimal size grain. Iteration space is walked with stride step.

Parameters

first	first value of the iteration variable
last	last value of the iteration variable
step	step increment for the iteration variable
grain	(> 0) minimum computation grain (n. of iterations scheduled together to a single worker)
f	f(const long idx) Lambda function, body of the parallel loop. idx: iteration param nw number of worker threads

template<typename T>

template<typename Function >

void ff::ParallelForReduce< T >::parallel_for_idx	(	long	first,
		long	last,
		long	step,
		long	grain,
		const Function &	f,
		const long	nw = `FF_AUTO`
	)

inline

Parallel for region with indexes ranges (step, grain, thid, idx) - dynamic - advanced usage.

Dynamic scheduling onto nw worker threads. Iterations are scheduled in blocks of minimal size grain. Iteration space is walked with stride step. A block of grain iterations are assigned to each worker but they are not automatically walked. Each block can be traversed within the parallel_for body (e.g. with a for loop within f with the same step).

Note: Useful in few cases only - requires some expertise

Parameters

first	first value of the iteration variable
last	last value of the iteration variable
step	step increment for the iteration variable
grain	(> 0) minimum computation grain (n. of iterations scheduled together to a single worker)
f	f(const long start_idx, const long stop_idx, const int thid) Lambda function, body of the parallel loop. start_idx and stop_idx: iteration bounds assigned to worker_id thid.
nw	number of worker threads (default n. of platform HW contexts)

template<typename T>

template<typename Function >

void ff::ParallelForReduce< T >::parallel_for_static	(	long	first,
		long	last,
		long	step,
		long	grain,
		const Function &	f,
		const long	nw = `FF_AUTO`
	)

inline

Parallel for region (step) - static.

Static scheduling onto nw worker threads. Iteration space is walked with stride step. Data is statically partitioned in blocks, i.e. partition size = last-first/(nw*step)

Parameters

first	first value of the iteration variable
last	last value of the iteration variable
step	step increment for the iteration variable
f	f(const long idx) body of the parallel loop
nw	number of worker threads

template<typename T>

template<typename Function >

void ff::ParallelForReduce< T >::parallel_for_thid	(	long	first,
		long	last,
		long	step,
		long	grain,
		const Function &	f,
		const long	nw = `FF_AUTO`
	)

inline

Parallel for region with threadID (step, grain, thid) - dynamic.

Dynamic scheduling onto nw worker threads. Iterations are scheduled in blocks of minimal size grain. Iteration space is walked with stride step. thid Worker thread ID is made available via a Lambda parameter.

Parameters

first	first value of the iteration variable
last	last value of the iteration variable
step	step increment for the iteration variable
grain	minimum computation grain (n. of iterations scheduled together to a single worker)
f	f(const long idx, const int thid) Lambda function, body of the parallel loop. idx: iteration, thid: worker_id
nw	number of worker threads (default n. of platform HW contexts)

template<typename T>

template<typename Function , typename FReduction >

void ff::ParallelForReduce< T >::parallel_reduce	(	T &	var,
		const T &	identity,
		long	first,
		long	last,
		const Function &	partialreduce_body,
		const FReduction &	finalreduce_body,
		const long	nw = `FF_AUTO`
	)

inline

Parallel reduce (basic)

Data is statically partitioned in blocks, i.e. partition size = last-first/(nw*step)

Reduce is executed in two phases: the first phase execute in parallel a partial reduce (by way of partialreduce_body function), the second reduces partial results (by way of finalresult_body). Typically the two function are really the same.

Parameters

var	inital value of reduction variable (accumulator)
indentity	indetity value for the reduction function
first	first value of the iteration variable
last	last value of the iteration variable
partialreduce_body	reduce operation (1st phase, executed in parallel)
finalreduce_body	reduce operation (2nd phase, executed sequentially)
nw	number of worker threads

template<typename T>

template<typename Function , typename FReduction >

void ff::ParallelForReduce< T >::parallel_reduce	(	T &	var,
		const T &	identity,
		long	first,
		long	last,
		long	step,
		const Function &	body,
		const FReduction &	finalreduce,
		const long	nw = `FF_AUTO`
	)

inline

Parallel reduce (step)

Iteration space is walked with stride step. Data is statically partitioned in blocks, i.e. partition size = last-first/(nw*step)

Reduce is executed in two phases: the first phase execute in parallel a partial reduce (by way of partialreduce_body function), the second reduces partial results (by way of finalresult_body). Typically the two function are really the same.

Parameters

var	inital value of reduction variable (accumulator)
indentity	indetity value for the reduction function
first	first value of the iteration variable
last	last value of the iteration variable
step	step increment for the iteration variable
partialreduce_body	reduce operation (1st phase, executed in parallel)
finalreduce_body	reduce operation (2nd phase, executed sequentially)
nw	number of worker threads

template<typename T>

template<typename Function , typename FReduction >

void ff::ParallelForReduce< T >::parallel_reduce	(	T &	var,
		const T &	identity,
		long	first,
		long	last,
		long	step,
		long	grain,
		const Function &	body,
		const FReduction &	finalreduce,
		const long	nw = `FF_AUTO`
	)

inline

Parallel reduce (step, grain)

Dynamic scheduling onto nw worker threads. Iterations are scheduled in blocks of minimal size grain. Iteration space is walked with stride /p step.

Reduce is executed in two phases: the first phase execute in parallel a partial reduce (by way of partialreduce_body function), the second reduces partial results (by way of finalresult_body). Typically the two function are really the same.

Parameters

var	inital value of reduction variable (accumulator)
indentity	indetity value for the reduction function
first	first value of the iteration variable
last	last value of the iteration variable
step	step increment for the iteration variable
partialreduce_body	reduce operation (1st phase, executed in parallel)
finalreduce_body	reduce operation (2nd phase, executed sequentially)
nw	number of worker threads

template<typename T>

template<typename Function , typename FReduction >

void ff::ParallelForReduce< T >::parallel_reduce_static	(	T &	var,
		const T &	identity,
		long	first,
		long	last,
		long	step,
		long	grain,
		const Function &	body,
		const FReduction &	finalreduce,
		const long	nw = `FF_AUTO`
	)

inline

Parallel reduce region (step) - static.

Static scheduling onto nw worker threads. Iteration space is walked with stride step. Data is statically partitioned in blocks, i.e. partition size = last-first/(nw*step)

Parameters

var	inital value of reduction variable (accumulator)
indentity	indetity value for the reduction function
first	first value of the iteration variable
last	last value of the iteration variable
step	step increment for the iteration variable
f	f(const long idx) body of the parallel loop
nw	number of worker threads

class ff::ff_pipe

template<typename TaskType>
class ff::ff_pipe< TaskType >

Pipeline pattern (high-level pattern syntax)

Set up a parallel for pipeline pattern run-time support object. Run with run_and_wait_end or run_the_freeze. See related functions.

Note: Don't use to model a workflow of tasks, stages are nonblocking threads and require one core per stage. If you need to model a workflow use ff::ff_mdf

Examples:: pipe_basic.cpp.

Inheritance diagram for ff::ff_pipe< TaskType >:

Collaboration diagram for ff::ff_pipe< TaskType >:

Public Member Functions
template<typename... Arguments>
	ff_pipe (Arguments...args)
	Create a stand-alone pipeline (no input/output streams). Run with `run_and_wait_end` or `run_the_freeze`. More...

template<typename... Arguments>
	ff_pipe (bool input_ch, Arguments...args)
	Create a pipeline (with input stream). Run with `run_and_wait_end` or `run_the_freeze`. More...

Public Member Functions inherited from ff::ff_pipeline
	ff_pipeline (bool input_ch=false, int in_buffer_entries=DEF_IN_BUFF_ENTRIES, int out_buffer_entries=DEF_OUT_BUFF_ENTRIES, bool fixedsize=true)
	Constructor. More...

	~ff_pipeline ()
	Destructor.

int	add_stage (ff_node *s)
	It adds a stage to the pipeline. More...

int	wrap_around (bool multi_input=false)
	Feedback channel (pattern modifier) More...

int	run (bool skip_init=false)
	Run the pipeline skeleton asynchronously. More...

int	wait ()
	wait for pipeline termination (all stages received EOS)

int	wait_freezing ()
	wait for pipeline to complete and suspend (all stages received EOS) More...

bool	offload (void *task, unsigned int retry=((unsigned int)-1), unsigned int ticks=ff_node::TICKS2WAIT)
	offload a task to the pipeline from the offloading thread (accelerator mode) More...

bool	load_result (void **task, unsigned int retry=((unsigned int)-1), unsigned int ticks=ff_node::TICKS2WAIT)
	gets a result from a task to the pipeline from the main thread (accelator mode) More...

bool	load_result_nb (void **task)
	try to get a result from a task to the pipeline from the main thread (accelator mode) More...

double	ffTime ()
	Misure ff::ff_node execution time. More...

Public Member Functions inherited from ff::ff_node
virtual bool	put (void *ptr)
	Nonblocking put onto output channel. More...

virtual bool	get (void **ptr)
	Noblocking pop from input channel. More...

virtual FFBUFFER *	get_in_buffer () const
	Gets input channel. More...

virtual FFBUFFER *	get_out_buffer () const
	Gets pointer to the output channel. More...

virtual bool	ff_send_out (void *task, unsigned int retry=((unsigned int)-1), unsigned int ticks=(TICKS2WAIT))
	Sends out the task. More...

Additional Inherited Members
Protected Member Functions inherited from ff::ff_pipeline
void *	svc (void *task)
	The service callback (should be filled by user with parallel activity business code) More...

int	svc_init ()
	Service initialisation. More...

void	svc_end ()
	Service finalisation. More...

int	create_input_buffer (int nentries, bool fixedsize)
	Creates the input channel. More...

int	create_output_buffer (int nentries, bool fixedsize=false)
	Creates the output channel. More...

int	set_output_buffer (FFBUFFER *const o)
	Assign the output channelname to a channel. More...

Protected Member Functions inherited from ff::ff_node
virtual void	skipfirstpop (bool sk)
	Set the ff_node to start with no input task. More...

bool	skipfirstpop () const
	Gets the status of spontaneous start. More...

virtual int	set_input_buffer (FFBUFFER *const i)
	Assign the input channelname to a channel. More...

virtual double	wffTime ()
	Misure ff_node::svc execution time. More...

virtual	~ff_node ()
	Destructor.

Related Functions inherited from ff::ff_pipeline
int	run_and_wait_end ()
	run the pipeline, waits that all stages received the End-Of-Stream (EOS), and destroy the pipeline run-time More...

virtual int	run_then_freeze (bool skip_init=false)
	run the pipeline, waits that all stages received the End-Of-Stream (EOS), and suspend the pipeline run-time More...

Constructor & Destructor Documentation

template<typename TaskType>

template<typename... Arguments>

ff::ff_pipe< TaskType >::ff_pipe ( Arguments... args )

inline

Create a stand-alone pipeline (no input/output streams). Run with run_and_wait_end or run_the_freeze.

Identifies an stream parallel construct in which stages are executed in parallel. It does require a stream of tasks, either external of created by the first stage.

Parameters

args	pipeline stages, i.e. a list f1,f2,... of functions with the following type `const std::function<T(T,ff_node*const)>`

Example: pipe_basic.cpp

template<typename TaskType>

template<typename... Arguments>

ff::ff_pipe< TaskType >::ff_pipe	(	bool	input_ch,
		Arguments...	args
	)

inline

Create a pipeline (with input stream). Run with run_and_wait_end or run_the_freeze.

Identifies an stream parallel construct in which stages are executed in parallel. It does require a stream of tasks, either external of created by the first stage.

Parameters

input_ch	`true` to enable first stage input stream
args	pipeline stages, i.e. a list f1,f2,... of functions with the following type `const std::function<T(T,ff_node*const)>`

Example: pipe_basic.cpp

class ff::poolEvolution

template<typename T, typename env_t = char>
class ff::poolEvolution< T, env_t >

The pool evolution parallel pattern.

The pool pattern computes the set P as result of the following algorithm:

while not( t(P) ) do N = e ( s(P) ) P += f (N, P) end while

where 's' is a “candidate selection” function, which selects a subset of objects belonging to an unstructured object pool (P), 'e' is the "evolution" function, 'f' a "filter" function and 't' a "termination" function.

Examples:: funcmin.cpp.

Inheritance diagram for ff::poolEvolution< T, env_t >:

Collaboration diagram for ff::poolEvolution< T, env_t >:

Protected Member Functions
void *	svc (void *task)
	The service callback (should be filled by user with parallel activity business code) More...

Protected Member Functions inherited from ff::ff_node
virtual void	skipfirstpop (bool sk)
	Set the ff_node to start with no input task. More...

bool	skipfirstpop () const
	Gets the status of spontaneous start. More...

virtual int	create_input_buffer (int nentries, bool fixedsize=true)
	Creates the input channel. More...

virtual int	create_output_buffer (int nentries, bool fixedsize=false)
	Creates the output channel. More...

virtual int	set_output_buffer (FFBUFFER *const o)
	Assign the output channelname to a channel. More...

virtual int	set_input_buffer (FFBUFFER *const i)
	Assign the input channelname to a channel. More...

virtual int	run (bool=false)
	Run the ff_node. More...

virtual int	wait ()
	Wait ff_node termination. More...

virtual int	wait_freezing ()
	Wait the freezing state. More...

virtual double	ffTime ()
	Misure ff::ff_node execution time. More...

virtual double	wffTime ()
	Misure ff_node::svc execution time. More...

virtual	~ff_node ()
	Destructor.

Additional Inherited Members
Public Member Functions inherited from ff::ff_node
virtual int	svc_init ()
	Service initialisation. More...

virtual void	svc_end ()
	Service finalisation. More...

virtual bool	put (void *ptr)
	Nonblocking put onto output channel. More...

virtual bool	get (void **ptr)
	Noblocking pop from input channel. More...

virtual FFBUFFER *	get_in_buffer () const
	Gets input channel. More...

virtual FFBUFFER *	get_out_buffer () const
	Gets pointer to the output channel. More...

virtual bool	ff_send_out (void *task, unsigned int retry=((unsigned int)-1), unsigned int ticks=(TICKS2WAIT))
	Sends out the task. More...

Member Function Documentation

template<typename T, typename env_t = char>

void* ff::poolEvolution< T, env_t >::svc ( void * task )

inlineprotectedvirtual

The service callback (should be filled by user with parallel activity business code)

Parameters

task	is a the input data stream item pointer (task)

Returns: output data stream item pointer

Implements ff::ff_node.

Files

Classes

Functions

Detailed Description

Class Documentation

template<typename lb_t = ff_loadbalancer, typename gt_t = ff_gatherer> class ff::ff_farm< lb_t, gt_t >

Public Member Functions

Protected Member Functions

Constructor & Destructor Documentation

Member Function Documentation

Public Member Functions

Additional Inherited Members

Constructor & Destructor Documentation

Member Function Documentation

Public Member Functions

Constructor & Destructor Documentation

Member Function Documentation

template<typename T> class ff::ParallelForReduce< T >

Public Member Functions

Protected Member Functions

Constructor & Destructor Documentation

Member Function Documentation

template<typename TaskType> class ff::ff_pipe< TaskType >

Public Member Functions

Additional Inherited Members

Constructor & Destructor Documentation

template<typename T, typename env_t = char> class ff::poolEvolution< T, env_t >

Protected Member Functions

Additional Inherited Members

Member Function Documentation

template<typename lb_t = ff_loadbalancer, typename gt_t = ff_gatherer>
class ff::ff_farm< lb_t, gt_t >

template<typename T>
class ff::ParallelForReduce< T >

template<typename TaskType>
class ff::ff_pipe< TaskType >

template<typename T, typename env_t = char>
class ff::poolEvolution< T, env_t >