Marco Danelutto (marcod) home page/~marcod/wiki/doku.php?id=paraphrasessdublin14

Slides by M. Danelutto
Slides by M. Torquati

(final version of these slides will be eventually available from the ParaPhrase web site)

FastFlow Hands on session

For the initial exercises, you may find under the links in the text some links suggesting the kind of interface to be used to design the example code.

Ip address of the machine to use remotely

Is 193.1.202.10

Username: student
Passwd: ask the teacher

FastFlow can be accessed under /home/madalina/mc-fastflow and therefore to compile you may use:

g++ -I /home/madalina/mc-fastflow -O3 file.cpp -o ... -lpthread

If you want to use your own machine, please download the FastFlow tarball from sourceforge, untar it (tar xzvf taball.tgz) and use the pot directory in the g++ -I compile flag).

Documentation

Remember there is a small tutorial (covering the non-C++11 part of the exercises) in the last chapter of the book distributed as Summer school material.

Exercise 1

Write a three stage pipeline, such that:

the first stage produces a stream of task, each task contains 2 arrays of fixed length initialised randomly. The following is a possible type of the task produced in output by the first stage:

   	 struct task_t {
	     task_t(size_t length):A(lenght),B(length) {}
             std::vector<size_t> A;
	     std::vector<size_t> B;
	 }

the second stage computes the sum of the 2 arrays received in input (i.e sum+=A[i]*B[i] i⇐0<length) and produces in output a new task containing the sum value just computed
the last stage computes the sum of all input tasks and prints it on the standard output before exiting.

Code here

Exercise 1.1

Write the same code of the Exercise 1 using the C++11 syntax (i.e. ff_pipe instead of ff_pipeline).

Exercise 1.2

Write the second stage of the Exercise 1 using the FastFlow farm pattern. Then, run some tests varying the scheduling policy (default vs on-demand with different granularity).

Implement the Exercise 1 using a farm pattern instead of the pipeline. The Emitter implements the first stage, the Worker computes in parallel the second stage, the Collector implements the last stage. Then, run the code varying the number of worker threads.

Exercise 3

Write an accelerator for computing the dot-product of 2 arrays. Given A,B of size N, the dotprod(A,B) is defined as: sum(A[i]*B[i]) 0⇐i<N.

Exercise 3.1:

Implement the Exercise 3 using a non-blocking wait (i.e. load_result_nb()) such that after having received 10 partial results from the accelerator, you may write on the standard output the current sum of elements computed so far.

Exercise 4

This is the standard code for multiplying 2 square matrices (size NxN):

    for(long i=0;i<N;++i) 
        for(long j=0;j<N;++j)
            for(long k=0;k<N;++k)
                C[j*N+k] += A[j*N+i]*B[i*N+k];

Consider the case in which N=1024. Parallelize the above code using a ParallelForReduce pattern applied to the inner loop (for k). Than, compare the results obtained with the case in which the ParallelFor is applied to the outermost loop (for i).

Exercise 5

Consider the concurrent code in the file threadatomic.cpp provided. Rewrite the code using FastFlow.

ParaPhrase Summer School 2014

Slides