Parallel Algorithms (WISM 459), 2014/2015

Please note that the Mastermath course page is not yet completely up to date. This course page contains the correct information. Class starts on Wednesday September 10, 2014 at 10.15 hour in Utrecht. Buys Ballot building Room 069, Princetonplein 5, Uithof, Utrecht.

Teacher

Rob Bisseling

Time and place

Every Wednesday from 10.15-13.00 hour at Utrecht University, campus De Uithof. Location of lectures: room 069 in the Buys Ballot building. First lecture: September 10, 2014. Last lecture: December 17, 2014. No class on November 12. Each session consists of 2 times 45 min. lectures and one exercise class (or computer laboratory class, or question session) of 45 min. Note that class always starts at 10.15 hour, because it is part of Mastermath. On September 17 and November 5, we will start with a lecture in BBG069 and finish with a computer laboratory in Minnaert 205.

Intended audience

This is a Master's course which is an elective course for students of the Master's programme Mathematical Sciences of Utrecht University, part of the Scientific Computing specialisation. The course is recommended for students in Mathematics or Computer Science who are interested in scientific computing and would like to obtain a first `hands-on experience' in parallel programming. It is also highly suitable for the Honours master programme in Theoretical Physics and Mathematical Sciences and for students interested in Computational Physics. If you consider taking this course as part of a Bachelor's degree, please contact the teacher first about the prerequisites. The course is part of the Dutch National Master Mastermath and hence is open for all interested master students in Mathematics in the Netherlands. Registration for everybody is through Mastermath.

Course aim

Students will learn how to design a parallel algorithm for a problem from the area of scientific computing and how to write a parallel program that solves the problem.

Learning goals

After completion of the course, the student is able to

Credits

You get 8 ECTS credit points.

Contents

Today, parallel computers are appearing on our desktops. The advent of dual-core and quad-core computers and the expected increase in the number of cores in the coming years, inevitably will cause a major change in our approach to software, such as the software we use in scientific computations. Parallel computers drastically increase our computational capabilities and thus enable us to model more realistically in many application areas.

To make efficient use of parallel computers, it is necessary to reorganise the structure of our computational methods. In particular, attention must be paid to the division of work among the different processors solving a problem in parallel and to the communication between them. Suitable parallel algorithms and systems software are needed to realise the capabilities of parallel computers.

We will discuss extensively the most recent developments in the area of parallel computers, ranging from multi-core desktop PCs, to clusters of PCs connected by switching devices, to massively parallel computers with distributed memory such as our national supercomputer Cartesius at SurfSARA in Amsterdam.

The following subjects will be treated:

Language

The course will be given in English. All reading material will be in English.

Prerequisites

Introductory course in linear algebra. Some knowledge of algorithms and programming is helpful. The laboratory classes will use the programming language C. If you don't know C, this may be an opportunity to learn it. You can also use C++, if you already know that language. We will make use of the recently released MulticoreBSP for C software developed by Albert-Jan Yzelman. This sofware runs on shared-memory multicore PCs, and you can also run your program without any change on distributed-memory machines such as Cartesius.

Examination

The examination is in the form of two assignments during the course, each with a weight of 45%, and a number of homework exercises with a total weight of 10%. The two assignments are carried out in the exercise/laboratory class and at home. A written report must be returned before the deadline. Students can work individually or in pairs (but not in larger teams) on the computer programs and can hand in a single report, provided each did a fair share of the work and can account for that. Homework must be made individually. If needed, you will have to explain your answers to the homework exercises.

Students from Utrecht must submit reports for the assignments in hardcopy paper format. All students must use LaTeX. Students from outside Utrecht University may also submit reports electronically in PDF format by email, but hardcopy is still preferred.

Literature

We closely follow the book Parallel Scientific Computation: A Structured Approach using BSP and MPI (PSC), by Rob H. Bisseling, Oxford University Press, March 2004. ISBN 0-19-852939-2. Please note that the book is now available by the printing-on-demand system of Oxford University Press, at the book's OUP website. Furthermore, our student union Aeskwadraat may still have a limited number of copies. I am currently working on a second edition, and some additional material will be given to you (and will be tested on you!).

The first chapter of the book is freely available from the publisher's website, see "Sample material".

In addition, all my transparancies and WWW links to background material are provided through this course page. (Pointers to other relevant links are welcome!)

Transparancies

LaTeX sources (in Beamer) and PDFs of all my transparencies (14 Mbyte in gzipped tar format). The sources may be of help to other teachers who want to teach from my book. Students may find them helpful too.

Last update of the files: September 26, 2012. The transparancies are complete and they cover every section of the book. They have been converted to the Beamer macros in September 2012, and have been made up to date in the process. The old version (from 2007) in Prosper can be found here.

You can also obtain the PDFs separately. Each PDF file represents one lecture of about 45 minutes.

Chapter 1: sections 2 3 4 5-7
Chapter 2: sections 1-2 3 4 5-6
Chapter 3: sections 1-2 3 4 5 6 7
Chapter 4: sections 1 2 3 4 5 6 7 8 9 10
Appendix C: Programming in BSP style using MPI, 1 (MPI-1) 2 (MPI-2).

Further reading

Some weekly summaries and links

Wednesday September 10, 2014. Room 069 Buys Ballot building

The Bulk Synchronous Parallel model

(PSC section 1.2) What is a parallel computer? Why do we compute in parallel? The Bulk Synchronous Parallel (BSP) model by Valiant comprises an abstract machine architecture, a framework for developing algorithms, and a cost function for analysing the run time of algorithms. The BSP architecture is a set of processor-memory pairs connected by a black box communication network. BSP algorithms consist of computation supersteps and communication supersteps. Communication is costed as h-relations. The BSP cost of an algorithm is expressed in machine parameters p, g, l. The computing rate is r flop/s. Motivation of the cost model: bottlenecks at the entrance and exit of the communication network.

Parallel Inner Product Computation

(PSC section 1.3) My First Parallel Algorithm: the computation of the inner product of two vectors. Possible distributions: cyclic and block. Data distribution leads naturally to work distribution. Communication is needed to add local inner products into a global inner product. Single Program, Multiple Data (SPMD) approach: we don't want to write 400 different programs for a machine with 400 processors. One-sided communication is a great invention (in parallel computing, at least;).

Exercise class

In class: Exercise 1.2. Homework: Exercise 1.1

Interesting links

Wednesday September 17, 2014.

BSPlib, the Bulk Synchronous Parallel library. Room BBG069

(PSC section 1.4)

Benchmarking

(PSC sections 1.5-1.7)

Interesting links:

Computer Laboratory in room Minnaert 205 (from 12.00-13.00 hour): starting with BSPlib.
This is your first hands-on session with BSPlib. You may have to set up your accounts. Download the latest version of BSPedupack , my package of educational programs that teaches how to use BSP. Solve Exercise 1.3: try to run the benchmark, exploring your parallel environment: your own laptop and Cartesius. Change puts into gets. If you own a laptop, bring it to class, so we can install MulticoreBSP for C and check your software

Wednesday September 24, 2014.

Sequential LU Decomposition

(PSC sections 2.1-2.2)

Parallel LU Decomposition

(PSC section 2.3)

Exercise class
Answers to Exercise 1.2. Starting Exercise 1.7 from PSC. Design a parallel algorithm for prime number generation. Write your first parallel program. This program should generate all prime numbers below 1,000,000 in parallel. Choosing the right distribution is the name of the game.

Hand in a report on exercise 1.7 before the deadline, Wednesday October 15, 10.15 hour.

Interesting links:

Wednesday October 1, 2014.

Two-Phase Broadcasting

(PSC section 2.4)

Experiments with bsplu

(PSC sections 2.5-2.6)

Exercise class

Homework on Chapter 2, TBA.

Wednesday October 8, 2014.

Sequential Recursive Fast Fourier Transform (FFT)

(PSC sections 3.1-3.2) A wonderful algorithm.

Sequential Nonrecursive FFT

(PSC section 3.3)

Exercise class

TBA.

Wednesday October 15, 2014.

Parallel Fast Fourier Transform

(PSC section 3.4)

Program bspfft

(PSC section 3.6)

Exercise class

Choose a new assignment. Possibilities: solve Exercise 2.5 on Cholesky factorisation; solve Exercise 2.7 on Strassen matrix multiplication; write a parallel program for Karp-Sipser maximum-cardinality matching in a general graph; write a parallel program for counting self-avoiding walks (SAWs) on a 2D or 3D lattice. (The current world record for SAWs on the 3D cubic lattice is 36 steps.) Requests for different topics will be considered. Please try to discuss the chosen topic with the teacher before mid-November.

Wednesday October 22, 2014.

Guest lecture TBA

Wednesday October 29, 2014

Sequential sparse matrix-vector multiplication

(PSC section 4.1)

Sparse matrices and their data structures

(PSC section 4.2)

Exercise class

Homework on Chapter 3, TBA.

Wednesday November 5, 2014.

Parallel sparse matrix-vector multiplication

(PSC section 4.3)

Cartesian matrix distributions

(PSC section 4.4)

Computing laboratory (12.00-13.00 hour) Minnaert 205

Work on your new assignment.

Wednesday November 12, 2014. No class

because of participation in Dagstuhl Seminar High-performance Graph Algorithms and Applications in Computational Science

Wednesday November 19, 2014.

Mondriaan sparse matrix distribution

(PSC section 4.5)

New developments in 2D sparse matrix distribution
Movies of matrix partitioning, improvements in quality and speed by the medium-grain method (Mondriaan 4.0).
Exercise class
TBA

Interesting links:

Wednesday November 26, 2014.

Laplacian matrices

(PSC section 4.8)

Interesting links:

Exercise class

Exercise 4.2, finding the optimal distribution over 8 processors for a 12 by 12 grid.

Wednesday December 3, 2014.

Sequential graph matching

(new material)

Parallel graph matching

(new material)

Exercise class
Exercise on graph matching.

Interesting links:

Wednesday December 10, 2014. Room BBG 017

Program bspmatch

(new material)

Exprimental results

(new material)

Exercise class
TBA

Interesting links:

Wednesday December 17, 2011.

Message Passing Interface (MPI-1)

(PSC sections C.1, C.2.1-C.2.4)

Message Passing Interface (MPI-2)

(PSC Section C2.5, C3, C4)

Exercise class
TBA

Please fill out the Mastermath digital evaluation form.

Deadline second assignment January 15, 2015, 17.00 hour.

Hand in a report before the deadline. Please use the batch system of Cartesius and not the interactive system for your test runs. The interactive system is only for development.

Other courses that teach the BSP model (among others)


Last update of this page: September 1, 2014
Back to homepage of Rob Bisseling