Parallel Algorithms (WISM 459), 2011/2012
Teacher
Rob Bisseling
Time and place
Every Wednesday from 10.15-13.00 hour at
Utrecht University, campus De Uithof.
Location of lectures: room 611 Mathematics building.
First lecture: September 7, 2011.
Last lecture: December 14, 2011.
Each session consists of 2 times 45 min. lectures and one exercise class
(or computer laboratory class, or question session) of 45 min.
Note that class always starts at 10.15 hour,
because it is part of Mastermath.
Intended audience
This is a Master's course which is an elective course for students
of the Master's programme Mathematical Sciences
of Utrecht University specialising in Scientific Computing.
(As of September 1, 2011, Scientific Computing is a specialisation of this 2-year programme.)
The course is recommended for students in Mathematics or Computer Science
who are interested in scientific computing
and would like to obtain a first `hands-on experience'
in parallel programming.
It is also highly suitable for the Honours master
programme in Theoretical Physics and Mathematical Sciences
and for students interested in Computational Physics.
If you consider taking this course as part of a Bachelor's degree,
please contact the teacher first about the prerequisites.
The course is part of the Dutch National Master
Mastermath and
hence is open for all interested master students in Mathematics
in the Netherlands.
Registration for everybody is through Mastermath.
Course aim
Students will learn how to design a parallel algorithm for a
problem from the area of scientific computing
and how to write a parallel program that
solves the problem.
Credits
You get 8 ECTS credit points.
Contents
Today, parallel computers are appearing on our desktops.
The advent of dual-core and quad-core computers and the expected increase
in the number of cores in the coming years, inevitably will cause a major change
in our approach to software, such as the software we use in scientific computations.
Parallel computers drastically increase our computational capabilities
and thus enable us to model more realistically in many application areas.
To make efficient use of parallel computers, it is necessary to reorganise
the structure of our computational methods.
In particular, attention must be paid to the division of work
among the different processors solving a problem in parallel
and to the communication between them. Suitable parallel algorithms and systems
software are needed to realise the capabilities of parallel computers.
We will discuss extensively the most recent developments in the area
of parallel computers, ranging from multi-core desktop PCs,
to clusters of PCs connected by switching devices,
to massively parallel computers with distributed memory
such as our national supercomputer Huygens at SARA in Amsterdam.
The following subjects will be treated:
- Types of existing parallel computers
- Principles of parallel computation: distributing the work evenly
and avoiding communication
- The Bulk Synchronous Parallel (BSP) model as an idealised model
of a parallel computer
- Use of BSPlib software as the basis for architecture-independent
programs
-
Parallel algorithms for the following problems:
prime number sieving, LU decomposition,
Fast Fourier Transform,
sparse matrix-vector multiplication, iterative solution of
linear systems, graph matching.
-
Analysis of the computation, communication, and synchronisation
time requirements of these algorithms by the BSP model.
Language
The course will be given in English.
All reading material will be in English.
Prerequisites
Introductory course in linear algebra.
Some knowledge of algorithms and programming is helpful.
The laboratory classes will use the programming language C.
If you don't know C, this may be an opportunity to learn it.
You can also use C++, if you already know that language.
For those who know Java, and would prefer to use this language,
use can be made of the recently released
MulticoreBSP software developed by Albert-Jan Yzelman
under the supervision of Rob Bisseling.
This sofware runs on shared-memory multicore PCs, but not on distributed-memory machines
such as Huygens.
Examination
The examination is in the form of two assignments during the course,
each with a weight of 45%, and a small written examination on the first
chapter of the book
with a weight of 10%. The small examination must be passed with grade 6
(out of 10) or higher.
The two assignments are carried out in the
exercise/laboratory class and at home.
A written report must be returned before the deadline.
Students can work in pairs (but not in larger teams)
on the computer programs, but
the reports must be written individually.
Programs produced in pairs must be clearly identified as such
(Only one program per pair needs to be given to me.)
Students from Utrecht must submit in hardcopy paper format, and you may use any
word processor you like, and even handwriting, if it is readable.
But LaTeX is preferred! Students from outside Utrecht University may
also submit reports electronically in PDF format by email,
but hardcopy is still preferred.
Literature
We closely follow the book
Parallel
Scientific Computation: A Structured Approach using BSP and MPI (PSC),
by Rob H. Bisseling,
Oxford University Press,
March 2004. ISBN 0-19-852939-2.
Please note that the book is now
available by the printing-on-demand system of Oxford University Press,
at the book's OUP website.
Furthermore, our student union Aeskwadraat may still have a limited number of copies.
The first chapter of the book is freely available from the
publisher's website,
see "Sample material".
In addition, all my transparancies and WWW links to
background material are provided through this course page.
(Pointers to other relevant links are welcome!)
Transparancies
LaTeX sources (in Prosper) and PDFs of all my transparencies
(61 Mbyte in gzipped tar format).
The sources may be of help to other teachers
who want to teach from my book.
Last update of the files: September 19, 2007.
The transparancies are now complete
and they cover every section of the book.
I am working on a new version of the slides using
the Beamer macros of Latex.
They will be made available sometime in the future.
You can also obtain the PDFs separately. Each PDF file represents one
lecture of about 45 minutes.
Chapter 1: sections
2
3
4
5-7
Chapter 2: sections
1-2
3
4
5-6
Chapter 3: sections
1-2
3
4
5
6
7
Chapter 4: sections
1
2
3
4
5
6
7
8
9
10
Appendix C:
Programming in BSP style using MPI,
1 (MPI-1)
2 (MPI-2).
Further reading
-
J. J. Dongarra, I. S. Duff, D. C. Sorensen, and H. A. van der Vorst,
Numerical Linear Algebra for High-Performance Computers,
SIAM, 1998.
-
G. H. Golub and C. F. van Loan, ``Matrix Computations",
Third edition, The Johns Hopkins University Press, 1996.
-
C. F. van Loan, ``Computational Frameworks for the Fast Fourier Transform",
SIAM, 1992.
Some weekly summaries and links
Wednesday September 7, 2011. Room 611 Mathematics building
The Bulk Synchronous Parallel model
(PSC section 1.2)
What is a parallel computer? Why do we compute in parallel?
The Bulk Synchronous Parallel (BSP) model by Valiant
comprises an abstract machine architecture, a framework for
developing algorithms,
and a cost function for analysing the run time of algorithms.
The BSP architecture is a set of processor-memory pairs
connected by a black box communication network.
BSP algorithms consist of computation supersteps
and communication supersteps.
Communication is costed as h-relations.
The BSP cost of an algorithm is expressed
in machine parameters p, g, l.
The computing rate is r flop/s.
Motivation of the cost model: bottlenecks at the entrance
and exit of the communication network.
Parallel Inner Product Computation
(PSC section 1.3)
My First Parallel Algorithm: the computation
of the inner product of two vectors.
Possible distributions: cyclic and block.
Data distribution leads naturally to work distribution.
Communication is needed to add local inner products into a global
inner product.
Single Program, Multiple Data (SPMD) approach:
we don't want to write 400 different programs for a machine with
400 processors.
One-sided communication is a great invention
(in parallel computing, at least;).
Exercise class
Exercise 1.2 from the PSC book.
Interesting links
Wednesday September 14, 2011. Room 611 Mathematics building
BSPlib, the Bulk Synchronous Parallel library
(PSC section 1.4)
Benchmarking
(PSC sections 1.5-1.7)
Interesting links:
Exercise class
Answers to Exercise 1.2 of the previous week.
Wednesday September 21, 2011. Room 611 Mathematics building
Sequential LU Decomposition
(PSC sections 2.1-2.2)
Computer Laboratory in Room Minnaert 018 (from 11.15-13.00):
starting with BSPlib. First assignment.
This is your first hands-on session with BSPlib. You may have to set up your accounts.
Download the latest version of
BSPedupack ,
my package of educational programs that
teaches how to use BSP.
Solve Exercise 1.3: try to run the benchmark,
exploring your parallel environment: your own laptop
and Huygens. Change puts into gets.
Starting Exercise 1.7 from PSC. Design a parallel algorithm
for prime number generation.
Write your first parallel
program.
This program should generate all prime numbers below 1,000,000 in parallel.
Choosing the right distribution is the name of the game.
If you own a laptop, bring it to class,
so we can install/check your software
Hand in an integrated report on these two exercises before the deadline,
Wednesday November 9, 10.15 hour. No extensions will be given.
Interesting links:
Wednesday September 28, 2011. Room 611 Mathematics building
Parallel LU Decomposition
(PSC section 2.3)
Two-Phase Broadcasting
(PSC section 2.4)
Exercise class
Theoretical exercise, Exercise 2.1: find
a non-Cartesian distribution for a 7x7 matrix and 9 processors
which is provably optimal for load balance.
Ignore the communication cost.
Continue with Exercise 1.7 from PSC. Discuss your parallel algorithm with me.
Wednesday October 5, 2011. Room 611 Mathematics building
Please note: we do have class this week!
Experiments with bsplu
(PSC sections 2.5-2.6)
Sequential Recursive Fast Fourier Transform (FFT)
(PSC sections 3.1-3.2) A wonderful algorithm.
Exercise class
Answer previous exercise.
Continue with Exercise 1.7 from PSC. Discuss your parallel algorithm with me.
Wednesday October 12, 2011. Room 611 Mathematics building
Sequential Nonrecursive FFT
(PSC section 3.3)
Small examination
Test of 60 minutes on Chapter 1. It is a closed-book examination!
This exam is from 11.15-12.15 hour
Parallel Fast Fourier Transform
(PSC section 3.4)
Wednesday October 19, 2011. Room 611 Mathematics building
Program bspfft
(PSC section 3.6)
Computer Laboratory in Room Minnaert 016 (from 11.15-13.00)
Continue with solving Exercise 1.7:
write a parallel program for generating prime numbers.
Run it on Huygens. Get familiar with the interactive and batch mode.
Interesting links:
-
Using batch mode on Huygens
by Albert-Jan Yzelman, tells you how to get started running programs
in batch.
- Vimtutor,
found to be useful if you want to learn some vi editor commands
on the Linux system.
Wednesday October 26, 2011. Room 611 Mathematics building
Experimental results for the FFT
(PSC section 3.7)
Sequential sparse matrix-vector multiplication
(PSC section 4.1)
Exercise class
Work on you assignment.
Wednesday November 2, 2011. Room 611 Mathematics building
Sparse matrices and their data structures
(PSC section 4.2)
Parallel sparse matrix-vector multiplication
(PSC section 4.3)
Exercise class
Answers of the exam. Finish your assignment.
Wednesday November 9, 2011. Room 611 Mathematics building
Today is the deadline of the first assignment. Hand it in at the start of the class.
Cartesian matrix distributions
(PSC section 4.4)
Mondriaan sparse matrix distribution
(PSC section 4.5)
Exercise class
Choose a new assignment. Possibilities:
solve Exercise 2.5 on Cholesky factorisation;
solve Exercise 2.7 on Strassen matrix multiplication;
solve Exercise 3.9 on computing π in parallel
(the current world record
is 5 trillion digits);
write a parallel program for counting self-avoiding walks on a 2D or 3D lattice.
Requests for different topics will be considered.
Wednesday November 16, 2011. Room 611 Mathematics building
Guest Lecture
Peter Arbenz, professor in Computational Science at the ETH Zurich,
and co-author of the book
Introduction to Parallel Computing
A practical guide with examples in C , Oxford University Press, February 2004.
Titel of the Talk:
"A Fast Parallel Poisson Solver on Irregular Domains Applied to Beam Dynamic Simulations".
Slides of the lecture.
Abstract. We discuss the scalable parallel solution of the Poisson equation within
a Particle-In-Cell (PIC) code for the simulation of electron beams in
particle accelerators of irregular shape. The problem is discretized by
Finite Differences. Depending on the treatment of the Dirichlet
boundary the resulting system of equations is symmetric or "mildly"
nonsymmetric positive definite. In all cases, the system is solved by
the preconditioned conjugate gradient algorithm with algebraic multigrid
(AMG) preconditioning. We investigate variants of the implementation of
AMG that lead to considerable improvements in the execution times. We
demonstrate good scalability of the solver on a distributed memory
parallel computer with up to 2048 processors. We also compare our
iterative solver with an FFT-based solver that is more commonly used for
applications in beam dynamics. We will introduce the FFT-based fast
Poisson solver and its parallelization.
Papers:
- A. Adelmann, P. Arbenz, Y. Ineichen:
"Improvements of a fast parallel Poisson solver on irregular domains".
Accepted for publication in the proceedings of PARA2010:
Workshop on the State-of-the-Art in Scientific and Parallel Computing, Reykjavik, Iceland,
June 6-9, 2010.
- A. Adelmann, P. Arbenz, Y. Ineichen:
"A Fast Parallel Poisson Solver on Irregular Domains Applied to Beam Dynamic Simulations",
J. Comp. Phys. 229 (12): 4554-4566 (2010), doi:10.1016/j.jcp.2010.02.022.
These are available on his
Selected Papers page.
The guest lecture is from 10.15-12.00 with 15 min break, and is followed
by an informal discussion of the students with the guest lecturer
a question and answers session on the broad topic of
"Parallel computing, computational science, and beyond".
Wednesday November 23, 2011. Room 611 Mathematics building
Random sparse matrices
(PSC section 4.7)
Laplacian matrices
(PSC section 4.8)
Second chance for small examination
Repeat test of 60 minutes on Chapter 1. Only if you failed
the first examination or want to improve your grade.
(The highest grade of the two counts.)
The exam is held from 12.00 hour to 13.00 hour.
Wednesday November 30, 2011. Room 611 Mathematics building
Lecture has been cancelled due to circumstances.
Wednesday December 7, 2011. Room 611 Mathematics building.
Program bspmv
(PSC section 4.9)
Experimental results on a beowulf cluster
(PSC section 4.10)
Exercise class
Solution to Exercise 4.2, finding the optimal distribution over 8 processors
for a 12 by 12 grid.
Discussion of second assignment.
Interesting links:
-
Sinterklaasgedicht door Wiebe Haanstra (in Dutch),
poem about the frustrations of parallel programming
Wednesday December 14, 2011. Room 611 Mathematics building.
Message Passing Interface (MPI-1)
(PSC sections C.1, C.2.1-C.2.4)
Message Passing Interface (MPI-2)
(PSC Section C2.5, C3, C4)
Computer Laboratory in Room Minnaert 016 from 12.00-13.00 hour.
Deadline second assignment January 16, 2012, 12.00 hour.
Hand in a report before the deadline.
Other courses that teach the BSP model (among others)
Last update of this page: December 6, 2011
Back to homepage of Rob Bisseling