User's guide Mondriaan version 4.2

How to download and install Mondriaan

Download the latest version from the Mondriaan software homepage. Uncompress with, e.g.,

This will create a directory Mondriaan4 which contains all the files of the Mondriaan package.

Important files are:

How to compile and test Mondriaan

Go inside the directory Mondriaan4 and type

This will compile the Mondriaan library, the MondriaanOpt library and tools included with the Mondriaan library. After compilation the include files are located in Mondriaan4/src/include, the compiled library in Mondriaan4/src/lib, and the stand-alone Mondriaan tools in Mondriaan4/tools. An example of how to use the Mondriaan library in your own program is provided by Mondriaan4/docs/example.c and Mondriaan4/tools/Mondriaan.c.

For more information about MATLAB usage, please see the Mondriaan MATLAB guide. To enable MATLAB support for Mondriaan and MondriaanOpt, edit the file mondriaan.mk and change the line containing the variable MATLABHOMEDIR to your current MATLAB install directory and remove the '#' in front of the line to uncomment it. The variable MEXSUFFIX should be set to the appropriate extension for your platform, as specified on the Mathworks site. This should be done before compiling Mondriaan.

Similarly, to enable PaToH support (the PaToH library can be found here), edit mondriaan.mk to change the line containing the PATOHHOMEDIR variable, as detailed above.

For each function of the core Mondriaan program, we have written a small (undocumented) test, which checks those functions for proper behaviour. The unit tests have been tried successfully on two architectures: Linux and Mac Os X. If one or more of the tests fails, please try to identify the relevant error message, and inform me (R . H . Bisseling @ NOSPAM uu . nl ) about the problem. I will try to help you solve it. The tests called can be found in the script runtest residing in the tests/ subdirectory. To compile and run all 109 unit tests, type

Timers

The compile option -DTIME causes the CPU time used by Mondriaan for the matrix distribution to be printed, and also the time for the vector distribution. This timer has a relatively low (guaranteed) accuracy, and it is in danger of clock wraparound. The compile option -DUNIX tells Mondriaan you are using a UNIX system. Together with -DTIME this also causes the elapsed (wall clock) time to be printed, which is an upper bound on the CPU time. Usually the timer accuracy is higher, and there is no danger of clock wraparound. This is the best timer if you are the single user of the system.

Levels of verbosity

Mondriaan writes the output distributions to file. It can also generate useful statistics about the partitioning to the standard output stream stdout (or the screen). There are three levels of verbosity: silent, standard, and verbose. The silent mode is useful when running the unit tests by make test. (If these are not done silently, the OKs are obscured.) The silent mode may also be useful if (parts of) Mondriaan are used as library functions. The verbose mode is useful when debugging, or trying to understand a particular run in detail. The standard mode generates 1-2 pages of output, and is aimed at easy digestion. You can change the verbosity level by commenting and uncommenting the appropriate CFLAGS lines in mondriaan.mk. Using the flag -DINFO generates standard output, and using the flags -DINFO -DINFO2 generates verbose output.

How to run Mondriaan

Go inside the directory Mondriaan4 and type

if you want to partition the arc130.mtx matrix (Matrix Market file format) for 8 processors with at most 3% load imbalance. The matrix should be the full relative path; in the above example output is saved in the Mondriaan tests folder (../tests/).
Mondriaan may also be used to partition matrices provided via the standard input, e.g., by using piping:

to obtain the same result as with the previous method, with one difference: results are written in the current (tools) directory. For integration with already existing software you may have, without resorting to writing and reading files (or pipes), look at the how to use Mondriaan as a library section of this guide.

Output

The main Mondriaan tool yields, after a successful run on an input matrix, various output files. All possible output files are described below. Typically, the output filenames are that of the input matrix filename, appended with a small descriptor and usually the number of parts x Mondriaan was requested to construct.

Distributed matrix (-Px)

The Mondriaan program writes the distributed matrix to a file called input-Px, where input is the name of the input matrix, or stdin if the matrix was read from the standard input, and x is the number of processors used in the distribution. We use an adapted Matrix Market format, with this structure:
%%MatrixMarket distributed-matrix coordinate real general
m n nnz P
Pstart[0]
( this should be 0 )
...
...
...
Pstart[P]( this should be nnz )
A.i[0] A.j[0] A.value[0] ...
...
...
A.i[nnz-1] A.j[nnz-1] A.value[nnz-1]
Here, Pstart[k] points to the start of the nonzeroes of processor k.

Processor indices (-Ix)

The Mondriaan program also writes the processor indices of each nonzero to the Matrix Market file input-Ix where the value of each nonzero is replaced by the processor index to which the nonzero has been assigned. The order of the nonzeroes is exactly that of the distributed matrix (-Px).

Row and column permutations (-rowx, -colx)

Mondriaan writes the row and column permutations determined by the Mondriaan algorithm (set by the Permute option) to input-rowx and input-colx. The goal of these permutations is to bring the input matrix A into (doubly) Separated Block Diagonal (SBD) or Bordered Block Diagonal (BBD) form, after applying the found permutations. This can have many possible applications, including, but not limited to, cache-oblivious sparse matrix vector multiplication (SBD form) or minimising fill-in in sparse LU decomposition (BBD form). These files are not written if the Permute option is set to none.

Reordered matrix (-reor-Px)

Mondriaan also directly writes the permuted matrix PAQ to file, where the permutation matrix P corresponds to the row permutation determined by the Mondriaan algorithm, as described in the previous paragraph. The permutation matrix Q is similarly inferred from the column permutation. This file is not written if the Permute option is set to none.

Separator boundary indices & hierarchy (-rowblocksx, -colblocksx)

These two files store two column-vectors each. The first column-vector relates to the separator boundary indices, the second to the separator hierarchy structure. The vectors are stored next to each other, so that each file has two columns of integers and m, resp, n rows. Each will be explained in turn.

Figure 1
Figure 2

The (doubly) separated block diagonal and bordered block diagonal forms each identify groups of nonzeroes, the separators, which serve as connectors between two parts resulting from a single bipartition. In the reordered matrix described above, these groups appear as relatively dense and usually small strips of consecutive rows and columns; the separator blocks. The indices where these blocks start and end in the reordered matrix are given in these two files; this is done by consecutively listing the start index and end index of each block, both separator and non-separator, as they occur in the row and column direction. These files are not written if the Permute option is set to none.

A more detailed explanation follows from the (idealised) Figures 1 and 2. The first shows a reordering corresponding to a single bipartition, clearly yielding four row and column indices indicating the start and end of the differently coloured partitions. This can occur recursively as shown in the second figure; there, each non-red partition is an instance of Figure 1, resulting in 16 row and column indices indicating the separators for the eight partitions. Assuming each non-separator block is two-by-two (x=2 in Figure 3), and the separator blocks are only one row and one column thick ( ~x=1), then the row and column boundary indices would equal (0,2,3,5,6,8,9,...,21,23). These two arrays are stored as a column vector in each file.

Figure 3

The second column in each file describes the separator hierarchy. As Mondriaan follows a bipartitioning scheme, the separator block from the very first bipartitioning corresponds to the separator blocks spanning the entire matrix. As bipartitioning is then called recursively, the separator blocks span only a subpart of the reordered matrix, and can be viewed as a child of the largest separator blocks; in this way a binary tree of separator blocks can be defined. See Figure 3 for clarification.

Each row of the rowblocks and columnblocks file corresponds to the start of a block (excluding the very last row which always equals m+1, resp., n+1, where m by n is the matrix size). Integers from 1 to m or 1 to n thus indicate blocks, and each separator block can point to its parent separator block by using these indices. This is exactly what is stored in the second column, where each non-separator block points to the separator block constructed in the same bipartitioning, and all separator blocks point to their direct parent separator block, as in Figure 3. The root separator, the one constructed in the very first bipartitioning, has no parent and therefore points to the non-existing index zero. Continuing the example, both the row and column hierarchies would be given by the array (2,4,2,8,6,4,6,0,10,12,10,8,14,12,14).

The rowblocks file in this instance would then look as follows (only partially shown):
0 2
2 4
3 2
5 8
...
21 14
23

Input/output vector distributions (-ux, -vx)

The program writes the processor numbers of the vector components to the files called input-ux and input-vx, where input is the name of the input matrix and x is the number of processors used in the distribution. The vectors u and v are the output and input vectors of the sparse matrix-vector multiplication u=A*v.

In I/O files, all indices (i,j) for matrix entries a(i,j) and vector components u(i) and v(j) start numbering from 1, following the Matrix Market conventions. In I/O, the processors are numbered 1 to P. Internally, the indices are converted to the standard C-numbering starting from 0.

Cartesian submatrices (-Cx)

The program writes the row index sets I(q) and column index sets J(q) of the Cartesian submatrix I(q) x J(q) for the processors q=1,...,P to the file called input-Cx, where input is the name of the input matrix and x is the number of processors used in the distribution. This file is additional information, useful e.g. for visualisation, and you may not need it.

Statistics on standard output

Provided the library is compiled with the -DINFO or -DINFO2 option, the program prints plenty of useful statistics to standard output. The communication volume is given for the two phases of the matrix-vector multiplication separately: the volume for v (first communication phase) and the volume for u (second communication phase). The bottom line is the communication cost which is defined as the sum of the costs of the two phases. The cost of a phase is the maximum of the number of data words sent and the number of data words received, over all processors. This metric is also called the BSP cost; it is the cost metric of the Bulk Synchronous Parallel model.

Graphical output

In the tools/ subdirectory the program MondriaanPlot can be used to get insight in the Mondriaan partitioning algorithm. This program creates a series of Truevision TGA image files named img0000.tga, img0001.tga, ..., up to the number of processors over which the matrix is divided. To use this program, issue

If you have mencoder installed, you can use the MondriaanMovie script to generate a small movie of the partitioning process:

This will create ../tests/arc130.mtx.avi. If you have imagemagick installed (or if the convert command is otherwise available), you can use MondriaanGIF to create an animated GIF of the partitioning process:

This will create ../tests/arc130.mtx.gif.

When creating images with MondriaanPlot, also an SVG file is generated. Just as the .tga files, the .svg file also contains a visualisation of the partitioning. In the above examples, the svg file created will be ../tests/arc130.mtx-8.svg.

Program options

The Mondriaan options can be set in the Mondriaan.defaults file. If no such file exists, it is created at run time. Afterwards, this file can be edited for further runs. The default values of the options are given below in boldface. It is possible to overrule the defaults from the command line, e.g. by typing

you can force Mondriaan to split the matrix in one dimension only, namely by rows.

Nonnumerical options

The nonnumerical options are used to choose partitioning methods. You may need to change them from the defaults to explore different partitioning methods.

Output options

Numerical options

The numerical options are often used to optimise given partitioning methods. Setting values such as NrRestarts, MaxNrLoops, or MaxNrNoGainMoves to a higher number, often results in better quality of the partitioning solution, at the expense of increased run-time.

How to use Mondriaan as a library

After successful compilation (using make), all relevant header files are exported to the src/include directory, and a static library is available in the src/lib directory.

To illustrate the use of Mondriaan we provide a small example program together with this guide. This example can be compiled (provided the Mondriaan library was successfully built) by executing

in the docs/ directory which will generate the executable a.out. More advanced use of the library is illustrated in tools/Mondriaan.c.

Mondriaan uses a triplet-based datastructure (the Matrix Market format) to store sparse matrices; that is, for each nonzero, its row- and column-index are stored, as well as its numerical value (in general). The exact representation is detailed in src/SparseMatrix.c/.h; and to successfully interface, your sparse matrix scheme has to be translated into this format.

Since the Compressed Row Storage (CRS), or alternatively known as Compressed Sparse Row (CSR), is a more prevailing standard, SparseMatrix comes with a translation function specifically for that datastructure: CRSSparseMatrixInit. It takes as input an uninitialised Mondriaan SparseMatrix struct, the matrix dimensions and number of nonzeroes, and the CRS datastructure arrays. The base parameter automatically translates from, e.g., 1-based arrays (base=1) as they may appear in for example Fortran, to 0-based arrays as used in Mondriaan.

Next is setting any options for Mondriaan. Recommended is to use an options file, storing all the defaults tuned to your application (e.g. as provided by tools/Mondriaan.defaults). These defaults can be read from file and stored in the Mondriaan Options struct by using the function SetOptionsFromFile; see src/Options.c/.h.

Once both the sparse matrix and the options are ready the main Mondriaan distribution function, DistributeMatrixMondriaan, can be used. This function takes as parameters the sparse matrix struct, the number of processors, the load imbalance parameter, and the options struct. The callback function is for advanced use, and is usually kept a NULL pointer (an example of using the callback is given in tools/MondriaanPlot.c). This function is a blocking call. After partitioning, all relevant information can be extracted from the SparseMatrix struct (see the source code for details). The struct can be freed by calling MMDeleteSparseMatrix.

Interfacing with non-C code is best achieved by writing a custom interface between your code and Mondriaan, using extern "C"-style additions when, e.g., interfacing between C++ and C; or using any other specific translation required (e.g., writing all parameters as pointers (Fortran), using jni (Java), etc.). As an example, the MATLAB interface is given in tools/MatlabMondriaan.c.

Profiling Mondriaan

To profile the Mondriaan library for various configurations, please use the Profile program and RunProfiler script both included in the tools/ directory. Entering the tools/ directory and executing

partitions the matrices a.mtx, b.mtx, and c.mtx among 8 processors with at most 3% imbalance, taking the average over 10 runs with a different random seed. The results are written, via stdout, to the LaTeX file out.tex which then contains a table with the averaged results of the partitioning process.

For code developers

If you develop new code to add to Mondriaan, you probably would like to add possibilities to use your code through a new option. In that case you need to adjust a few files:

More information

All functions of Mondriaan have extensive documentation in the source code (.c files). Please have a look there for more details.

References

[1] Cache-oblivious sparse matrix-vector multiplication by using sparse matrix partitioning methods, A. N. Yzelman and Rob H. Bisseling, SIAM Journal of Scientific Computation, Vol. 31, Issue 4, pp. 3128-3154 (2009).
[2] Two-dimensional cache-oblivious sparse matrix-vector multiplication, A. N. Yzelman and Rob H. Bisseling, Parallel Computing, Vol. 37, Issue 12, pp. 806-819 (2011).
[3] A medium-grain method for fast 2D bipartitioning of sparse matrices, Daniël M. Pelt and Rob H. Bisseling, submitted for publication (2013).
[4] A new metric enabling an exact hypergraph model for the communication volume in distributed-memory parallel applications O. Fortmeier, H. M. Bücker, B. O. Fagginger Auer, R. H. Bisseling, Parallel Computing, Vol. 39, Issue 8, pp. 319-335 (2013).
[5] A Fine-Grain Hypergraph Model for 2D Decomposition of Sparse Matrices Ümit V. Çatalyürek and Cevdet Aykanat, IPDPS 2001, p. 118 (2001).
[6] Communication balancing in parallel sparse matrix-vector multiplication Rob H. Bisseling and Wouter Meesen, ETNA, pp. 47-65 (2005).
[7] Two-Dimensional Approaches to Sparse Matrix Partitioning Rob H. Bisseling, Bas O. Fagginger Auer, A. N. Yzelman, Tristan van Leeuwen and Ümit V. Çatalyürek, Chapter 12 of Combinatorial Scientific Computing, Chapman and Hall/CRC, pp. 321-349 (2012).


Last updated: October 27, 2016.

June 9, 2009 by Rob Bisseling,
December 2, 2010 by Bas Fagginger Auer,
December 10, 2010 by A. N. Yzelman,
March 27, 2012 by Bas Fagginger Auer,
April 18, 2013 by Daniël M. Pelt,
August 29, 2013 by Rob Bisseling and Bas Fagginger Auer,
October 27, 2016 by Marco van Oort.

To the Mondriaan package home page.

Valid HTML 4.01 Strict Valid CSS!