## User's guide Mondriaan version 4.2

### How to download and install Mondriaan

Download the latest version from the Mondriaan software homepage. Uncompress with, e.g.,

`% tar xzvf mondriaan4.tar.gz`

(Here, '`%`

' is the prompt of your operating system.)

This will create a directory `Mondriaan4`

which contains all the files of the Mondriaan package.

Important files are:

`README`

, which tells you about copyright and how to cite the work.`COPYING`

, the GNU General Public License (GNU GPL).`COPYING.LESSER`

, the GNU Lesser General Public License (GNU LGPL), which is an addition to GNU GPL, making its use more liberal.`mondriaan.mk`

, which contains the Mondriaan compilation options.

### How to compile and test Mondriaan

Go inside the directory `Mondriaan4`

and type

`% make`

This will compile the Mondriaan library, the MondriaanOpt library and
tools included with the Mondriaan library.
After compilation the include files are located in `Mondriaan4/src/include`

,
the compiled library in `Mondriaan4/src/lib`

,
and the stand-alone Mondriaan tools in `Mondriaan4/tools`

.
An example of how to use the Mondriaan library in your own
program is provided by `Mondriaan4/docs/example.c`

and
`Mondriaan4/tools/Mondriaan.c`

.

For more information about MATLAB usage, please see the Mondriaan MATLAB guide.
To enable *MATLAB support* for Mondriaan and MondriaanOpt, edit the file
`mondriaan.mk`

and change the line containing the variable
`MATLABHOMEDIR`

to your current MATLAB install directory and
remove the '`#`

' in front of the line to uncomment it.
The variable `MEXSUFFIX`

should be set to the appropriate
extension for your platform, as specified on the Mathworks site.
This should be done before compiling Mondriaan.

Similarly, to enable *PaToH support* (the PaToH library can be found
here),
edit `mondriaan.mk`

to change the line containing the `PATOHHOMEDIR`

variable, as detailed above.

For each function of the core Mondriaan program,
we have written a small (undocumented) test, which checks those functions
for proper behaviour. The unit tests have been tried successfully on two
architectures: Linux and Mac Os X. If one or more of the tests fails,
please try to identify the relevant error message, and inform me
(R . H . Bisseling @ NOSPAM uu . nl ) about the problem. I will try to help you
solve it. The tests called can be found in the script `runtest`

residing in the `tests/`

subdirectory.
To compile and run all 109 *unit tests*, type

`% make test`

#### Timers

The compile option `-DTIME`

causes the CPU time used by Mondriaan
for the matrix distribution to be printed, and also the time for the vector
distribution. This timer has a relatively low (guaranteed) accuracy, and
it is in danger of clock wraparound.
The compile option `-DUNIX`

tells Mondriaan
you are using a UNIX system.
Together with `-DTIME`

this also causes the elapsed (wall clock)
time to be printed, which is an upper bound on the CPU time.
Usually the timer accuracy is higher, and there is no danger of clock
wraparound. This is the best timer if you are the single user of the system.

#### Levels of verbosity

Mondriaan writes the output distributions to file.
It can also generate useful statistics about the partitioning
to the standard output stream stdout (or the screen).
There are three levels of verbosity: silent, standard, and verbose.
The silent mode is useful when running the unit tests by
`make test`

. (If these are not done silently,
the OKs are obscured.)
The silent mode may also be useful if (parts of) Mondriaan are used as library
functions.
The verbose mode is useful when debugging, or trying to understand
a particular run in detail.
The standard mode generates 1-2 pages of output, and
is aimed at easy digestion.
You can change the verbosity level by commenting and uncommenting
the appropriate `CFLAGS`

lines in `mondriaan.mk`

.
Using the flag `-DINFO`

generates standard output,
and using the flags `-DINFO -DINFO2`

generates verbose output.

### How to run Mondriaan

Go inside the directory `Mondriaan4`

and type

`% cd tools`

`% ./Mondriaan ../tests/arc130.mtx 8 0.03`

if you want to partition the `arc130.mtx`

matrix (Matrix Market file format)
for 8 processors with at most 3% load imbalance. The matrix should be the full
relative path; *in the above example output is saved in the Mondriaan tests folder* (`../tests/`

).

Mondriaan may also be used to partition matrices provided via the standard
input, e.g., by using piping:

`% cat ../tests/arc130.mtx | ./Mondriaan - 8 0.03`

to obtain the same result as with the previous method, with one difference:
results are written in the current (`tools`

) directory.
For integration with already existing software you may have, without resorting to writing and reading files (or pipes),
look at the how to use Mondriaan as a library section of this guide.

### Output

The main `Mondriaan`

tool yields, after a successful run on an input matrix,
various output files. All possible output files are described below. Typically,
the output filenames are that of the input matrix filename, appended with a small
descriptor and usually the number of parts *x* Mondriaan was requested to
construct.

#### Distributed matrix (`-Px`

)

The `Mondriaan`

program writes the distributed matrix to a file called
`input-Px`

,
where `input`

is the name of the input matrix, or `stdin`

if the
matrix was read from the standard input, and x is the number of processors
used in the distribution.
We use an adapted Matrix Market format, with this structure:

`%%MatrixMarket distributed-matrix coordinate real general`

( this should be 0 )

m n nnz P

Pstart[0]

...

...

...

`Pstart[P]`

( this should be nnz )

`A.i[0] A.j[0] A.value[0]`

...

...

...

`A.i[nnz-1] A.j[nnz-1] A.value[nnz-1]`

Here, `Pstart[k]`

points to the start of the nonzeroes
of processor k.

#### Processor indices (`-Ix`

)

The `Mondriaan`

program also writes the processor indices of each
nonzero to the Matrix Market file `input-Ix`

where the value of each
nonzero is replaced by the processor index to which the nonzero has been assigned.
The order of the nonzeroes is exactly that of the distributed matrix (`-Px`

).

#### Row and column permutations (`-rowx`

, `-colx`

)

`Mondriaan`

writes the row and column permutations determined by the
Mondriaan algorithm (set by the `Permute`

option) to `input-rowx`

and `input-colx`

. The goal of these permutations is to bring the input
matrix *A* into (doubly) Separated Block Diagonal (SBD) or Bordered Block
Diagonal (BBD) form, after applying the found permutations. This can have many
possible applications, including, but not limited to, cache-oblivious sparse
matrix vector multiplication (SBD form) or minimising fill-in in sparse LU
decomposition (BBD form). These files are not written if the `Permute`

option is set to `none`

.

#### Reordered matrix (`-reor-Px`

)

`Mondriaan`

also directly writes the permuted matrix *PAQ* to
file, where the permutation matrix *P* corresponds to the row permutation
determined by the Mondriaan algorithm, as described in the previous paragraph.
The permutation matrix *Q* is similarly inferred from the column
permutation. This file is not written if the `Permute`

option is set
to `none`

.

#### Separator boundary indices & hierarchy
(`-rowblocksx`

, `-colblocksx`

)

These two files store two column-vectors each. The first column-vector
relates to the *separator boundary indices*, the second to the
*separator hierarchy structure*. The vectors are stored next to
each other, so that each file has two columns of integers and *m*,
resp, *n* rows. Each will be explained in turn.

The (doubly) separated block diagonal and bordered block diagonal forms
each identify groups of nonzeroes, the *separators*, which serve as
connectors between two parts resulting from a single bipartition. In the
reordered matrix described above, these groups appear as relatively dense
and usually small strips of consecutive rows and columns; the *separator
blocks*. The indices where these blocks start and end in the reordered
matrix are given in these two files; this is done by consecutively listing
the start index and end index of each block, both separator and
non-separator, as they occur in the row and column direction. These files
are not written if the `Permute`

option is set to `none`

.

A more detailed explanation follows from the (idealised) Figures 1 and 2.
The first shows a reordering corresponding to a single bipartition, clearly
yielding four row and column indices indicating the start and end of the
differently coloured partitions. This can occur recursively as shown in the
second figure; there, each non-red partition is an instance of Figure 1,
resulting in 16 row and column indices indicating the separators for
the eight partitions. Assuming each non-separator block is two-by-two
(*x*=2 in Figure 3), and the separator blocks are only one row and
one column thick (* ~x*=1), then the row and column boundary
indices would equal *(0,2,3,5,6,8,9,...,21,23)*. These two arrays
are stored as a column vector in each file.

The second column in each file describes the separator hierarchy. As
Mondriaan follows a bipartitioning scheme, the separator block from the
very first bipartitioning corresponds to the separator blocks spanning
the entire matrix. As bipartitioning is then called recursively, the
separator blocks span only a subpart of the reordered matrix, and can
be viewed as a *child* of the largest separator blocks; in this
way a binary tree of separator blocks can be defined. See Figure 3 for
clarification.

Each row of the `rowblocks`

and `columnblocks`

file
corresponds to the start of a block (excluding the very last row which
always equals m+1, resp., n+1, where m by n is the matrix size).
Integers from 1 to m or 1 to n thus indicate blocks, and each separator
block can point to its parent separator block by using these indices.
This is exactly what is stored in the second column, where each
non-separator block points to the separator block constructed in the
same bipartitioning, and all separator blocks point to their direct
parent separator block, as in Figure 3. The root separator, the one
constructed in the very first bipartitioning, has no parent and
therefore points to the non-existing index zero. Continuing the
example, both the row and column hierarchies would be given by the
array *(2,4,2,8,6,4,6,0,10,12,10,8,14,12,14)*.

The rowblocks file in this instance would then look as follows (only
partially shown):

```
0 2
```

2 4

3 2

5 8

...

21 14

23

#### Input/output vector distributions (`-ux`

, `-vx`

)

The program writes the processor numbers of the vector components
to the files called `input-ux`

and `input-vx`

,
where `input`

is the name of the input matrix
and x is the number of processors used in the distribution.
The vectors u and v are the output and input vectors
of the sparse matrix-vector multiplication *u=A*v*.

In I/O files, all indices (i,j) for matrix entries a(i,j) and vector components u(i) and v(j) start numbering from 1, following the Matrix Market conventions. In I/O, the processors are numbered 1 to P. Internally, the indices are converted to the standard C-numbering starting from 0.

#### Cartesian submatrices (`-Cx`

)

The program writes the row index sets I(q)
and column index sets J(q) of the Cartesian submatrix I(q) x J(q)
for the processors q=1,...,P to the file called `input-Cx`

,
where `input`

is the name of the input matrix
and x is the number of processors used in the distribution.
This file is additional information, useful e.g. for visualisation,
and you may not need it.

#### Statistics on standard output

Provided the library is compiled with the `-DINFO`

or `-DINFO2`

option,
the program prints plenty of useful statistics to standard output.
The **communication volume** is given for the two phases
of the matrix-vector multiplication separately:
the volume for v (first communication phase)
and the volume for u (second communication phase).
The bottom line is the **communication cost**
which is defined as the sum of the costs of the two phases.
The cost of a phase is the maximum of the number of data words
sent and the number of data words received, over all processors.
This metric is also called the **BSP cost**; it is the cost
metric of the Bulk Synchronous Parallel model.

#### Graphical output

In the `tools/`

subdirectory the program `MondriaanPlot`

can be used to get insight in the Mondriaan partitioning algorithm.
This program creates a series of Truevision TGA image files named
`img0000.tga`

, `img0001.tga`

, ..., up to the number of
processors over which the matrix is divided.
To use this program, issue

`% cd tools`

`% ./MondriaanPlot ../tests/arc130.mtx 8 0.03`

If you have `mencoder`

installed, you can use the `MondriaanMovie`

script to generate a small movie of the partitioning process:

`% ./MondriaanMovie ../tests/arc130.mtx 8 0.03`

This will create `../tests/arc130.mtx.avi`

.
If you have `imagemagick`

installed (or if the `convert`

command is
otherwise available), you can use `MondriaanGIF`

to create an animated
GIF of the partitioning process:

`% ./MondriaanGIF ../tests/arc130.mtx 8 0.03`

This will create `../tests/arc130.mtx.gif`

.

When creating images with `MondriaanPlot`

, also an SVG file is generated.
Just as the `.tga`

files, the `.svg`

file also contains a visualisation of the partitioning.
In the above examples, the svg file created will be `../tests/arc130.mtx-8.svg`

.

### Program options

The Mondriaan options can be set in the
`Mondriaan.defaults`

file. If no such file exists, it is created
at run time. Afterwards, this file can be edited for further runs.
The default values of the options are given below in **boldface**.
It is possible to overrule the defaults from the command line,
e.g. by typing

`% ./Mondriaan ../tests/arc130.mtx 8 0.03 -SplitStrategy=onedimrow`

you can force Mondriaan to split the matrix in one dimension only, namely by rows.

#### Nonnumerical options

The nonnumerical options are used to choose partitioning methods. You may need to change them from the defaults to explore different partitioning methods.

#### Output options

#### Numerical options

The numerical options are often used to optimise given partitioning methods.
Setting values such as `NrRestarts`

,
`MaxNrLoops`

, or
`MaxNrNoGainMoves`

to a higher number, often results in better quality
of the partitioning solution, at the expense of increased run-time.

### How to use Mondriaan as a library

After successful compilation (using `make`

),
all relevant header files are exported to the `src/include`

directory,
and a static library is available in the `src/lib`

directory.

To illustrate the use of Mondriaan we provide a small example program together with this guide. This example can be compiled (provided the Mondriaan library was successfully built) by executing

`% gcc example.c -I../src/include -L../src/lib -lMondriaan4 -lm`

in the `docs/`

directory which will generate the executable `a.out`

.
More advanced use of the library is illustrated in `tools/Mondriaan.c`

.

Mondriaan uses a triplet-based datastructure (the Matrix Market format) to store sparse matrices;
that is, for each nonzero, its row- and column-index are stored, as well as its numerical value (in general).
The exact representation is detailed in `src/SparseMatrix.c`

/`.h`

; and to successfully interface,
your sparse matrix scheme has to be translated into this format.

Since the Compressed Row Storage (CRS), or alternatively known as Compressed Sparse Row (CSR), is a more prevailing standard,
`SparseMatrix`

comes with a translation function specifically for that datastructure: `CRSSparseMatrixInit`

.
It takes as input an uninitialised Mondriaan `SparseMatrix`

struct, the matrix dimensions and number of nonzeroes, and
the CRS datastructure arrays. The `base`

parameter automatically translates from, e.g., 1-based arrays (base=1) as
they may appear in for example Fortran, to 0-based arrays as used in Mondriaan.

Next is setting any options for Mondriaan. Recommended is to use an options file, storing all the defaults tuned to your application (e.g. as provided by `tools/Mondriaan.defaults`

).
These defaults can be read from file and stored in the Mondriaan `Options`

struct by using the function `SetOptionsFromFile`

;
see `src/Options.c`

/`.h`

.

Once both the sparse matrix and the options are ready the main Mondriaan distribution function, `DistributeMatrixMondriaan`

, can be used.
This function takes as parameters the sparse matrix struct, the number of processors, the load imbalance parameter, and the options struct.
The callback function is for advanced use, and is usually kept a `NULL`

pointer (an example of using the callback is given in `tools/MondriaanPlot.c`

). This function is a blocking call.
After partitioning, all relevant information can be extracted from the `SparseMatrix`

struct (see the source code for details).
The struct can be freed by calling `MMDeleteSparseMatrix`

.

Interfacing with non-C code is best achieved by writing a custom interface between your code and Mondriaan, using `extern "C"`

-style additions when, e.g., interfacing between C++ and C; or using any other specific translation required (e.g., writing all parameters as pointers (Fortran), using `jni`

(Java), etc.).
As an example, the MATLAB interface is given in `tools/MatlabMondriaan.c`

.

### Profiling Mondriaan

To profile the Mondriaan library for various configurations, please use
the `Profile`

program and `RunProfiler`

script both included
in the `tools/`

directory.
Entering the `tools/`

directory and executing

`% ./Profile 8 0.03 10 a.mtx b.mtx c.mtx > out.tex`

partitions the matrices `a.mtx`

, `b.mtx`

, and `c.mtx`

among 8 processors with at most 3% imbalance, taking the average over 10
runs with a different random seed.
The results are written, via `stdout`

, to the LaTeX file `out.tex`

which then contains a table with the averaged results of the partitioning
process.

### For code developers

If you develop new code to **add to Mondriaan**, you probably would like to add possibilities
to use your code through a new option. In that case you need to adjust a few files:

`src/Options.h`

: you should add the option and its possible values.`src/Options.c`

: you should make a choice for the default value and set it in the function`SetDefaultOptions`

. If the option is a numerical option, you should check its range in the function`SetDefaultOptions`

. Finally, you should add the option and all its possible values as an if-statement in the function`SetOption`

.- subdirectory
`tests`

: it may happen that adding the options will break a few unit tests, in particular those connected to functions`GetParameters`

,`SetDefaultOptions`

,`SetOption`

. This is quite harmless, and easy to repair.

### More information

All functions of Mondriaan have extensive documentation in the source code (`.c`

files).
Please have a look there for more details.

### References

[1]
*Cache-oblivious sparse matrix-vector multiplication by using sparse matrix partitioning methods*,
A. N. Yzelman and Rob H. Bisseling, SIAM Journal of Scientific Computation, Vol. 31, Issue 4, pp. 3128-3154 (2009).

[2]
*Two-dimensional cache-oblivious sparse matrix-vector multiplication*,
A. N. Yzelman and Rob H. Bisseling, Parallel Computing, Vol. 37, Issue 12, pp. 806-819 (2011).

[3]
*A medium-grain method for fast 2D bipartitioning of sparse matrices*,
Daniël M. Pelt and Rob H. Bisseling, submitted for publication (2013).

[4]
*A new metric enabling an exact hypergraph model for the communication volume in distributed-memory parallel applications*
O. Fortmeier, H. M. Bücker, B. O. Fagginger Auer, R. H. Bisseling, Parallel Computing, Vol. 39, Issue 8, pp. 319-335 (2013).

[5]
*A Fine-Grain Hypergraph Model for 2D Decomposition of Sparse Matrices*
Ümit V. Çatalyürek and Cevdet Aykanat, IPDPS 2001, p. 118 (2001).

[6]
*Communication balancing in parallel sparse matrix-vector multiplication*
Rob H. Bisseling and Wouter Meesen, ETNA, pp. 47-65 (2005).

[7]
*Two-Dimensional Approaches to Sparse Matrix Partitioning*
Rob H. Bisseling, Bas O. Fagginger Auer, A. N. Yzelman, Tristan van Leeuwen and Ümit V. Çatalyürek,
Chapter 12 of Combinatorial Scientific Computing, Chapman and Hall/CRC, pp. 321-349 (2012).

Last updated: October 27, 2016.

June 9, 2009 by Rob Bisseling,

December 2, 2010 by Bas Fagginger Auer,

December 10, 2010 by A. N. Yzelman,

March 27, 2012 by Bas Fagginger Auer,

April 18, 2013 by Daniël M. Pelt,

August 29, 2013 by Rob Bisseling and Bas Fagginger Auer,

October 27, 2016 by Marco van Oort.

To
the Mondriaan package home page.