User's guide Mondriaan version 4.2.1

This page is continuously being improved and updated; therefore, a more recent version may be obtained online. This offline version is bundled with the software for your convenience.

How to download and install Mondriaan

Download the latest version from the Mondriaan software homepage. Uncompress with, e.g.,

% tar xzvf mondriaan4.tar.gz
(Here, '%' is the prompt of your operating system.)

This will create a directory Mondriaan4 which contains all the files of the Mondriaan package.

Important files are:

README, which tells you about copyright and how to cite the work.
COPYING, the GNU General Public License (GNU GPL).
COPYING.LESSER, the GNU Lesser General Public License (GNU LGPL), which is an addition to GNU GPL, making its use more liberal.
mondriaan.mk, which contains the Mondriaan compilation options.

How to compile and test Mondriaan

Go inside the directory Mondriaan4 and type

% make

This will compile the Mondriaan library, the MondriaanOpt library and tools included with the Mondriaan library. After compilation the include files are located in Mondriaan4/src/include, the compiled library in Mondriaan4/src/lib, and the stand-alone Mondriaan tools in Mondriaan4/tools. An example of how to use the Mondriaan library in your own program is provided by Mondriaan4/docs/example.c and Mondriaan4/tools/Mondriaan.c.

For more information about MATLAB usage, please see the Mondriaan MATLAB guide. To enable MATLAB support for Mondriaan and MondriaanOpt, edit the file mondriaan.mk and change the line containing the variable MATLABHOMEDIR to your current MATLAB install directory and remove the '#' in front of the line to uncomment it. The variable MEXSUFFIX should be set to the appropriate extension for your platform, as specified on the Mathworks site. This should be done before compiling Mondriaan.

Similarly, to enable PaToH support (the PaToH library can be found here), edit mondriaan.mk to change the line containing the PATOHHOMEDIR variable, as detailed above.

For each function of the core Mondriaan program, we have written a small (undocumented) test, which checks those functions for proper behaviour. The unit tests have been tried successfully on two architectures: Linux and Mac Os X. If one or more of the tests fails, please try to identify the relevant error message, and inform me (R . H . Bisseling @ NOSPAM uu . nl ) about the problem. I will try to help you solve it. The tests called can be found in the script runtest residing in the tests/ subdirectory. To compile and run all 109 unit tests, type

% make test

Timers

The compile option -DTIME causes the CPU time used by Mondriaan for the matrix distribution to be printed, and also the time for the vector distribution. This timer has a relatively low (guaranteed) accuracy, and it is in danger of clock wraparound. The compile option -DUNIX tells Mondriaan you are using a UNIX system. Together with -DTIME this also causes the elapsed (wall clock) time to be printed, which is an upper bound on the CPU time. Usually the timer accuracy is higher, and there is no danger of clock wraparound. This is the best timer if you are the single user of the system.

Levels of verbosity

Mondriaan writes the output distributions to file. It can also generate useful statistics about the partitioning to the standard output stream stdout (or the screen). There are three levels of verbosity: silent, standard, and verbose. The silent mode is useful when running the unit tests by make test. (If these are not done silently, the OKs are obscured.) The silent mode may also be useful if (parts of) Mondriaan are used as library functions. The verbose mode is useful when debugging, or trying to understand a particular run in detail. The standard mode generates 1-2 pages of output, and is aimed at easy digestion. You can change the verbosity level by commenting and uncommenting the appropriate CFLAGS lines in mondriaan.mk. Using the flag -DINFO generates standard output, and using the flags -DINFO -DINFO2 generates verbose output.

How to run Mondriaan

Go inside the directory Mondriaan4 and type

% cd tools
% ./Mondriaan ../tests/arc130.mtx 8 0.03

if you want to partition the arc130.mtx matrix (Matrix Market file format) for 8 processors with at most 3% load imbalance. The matrix should be the full relative path; in the above example output is saved in the Mondriaan tests folder (../tests/).
Mondriaan may also be used to partition matrices provided via the standard input, e.g., by using piping:

% cat ../tests/arc130.mtx | ./Mondriaan - 8 0.03

to obtain the same result as with the previous method, with one difference: results are written in the current (tools) directory. For integration with already existing software you may have, without resorting to writing and reading files (or pipes), look at the how to use Mondriaan as a library section of this guide.

Output

The main Mondriaan tool yields, after a successful run on an input matrix, various output files. All possible output files are described below. Typically, the output filenames are that of the input matrix filename, appended with a small descriptor and usually the number of parts x Mondriaan was requested to construct.

Distributed matrix (`-Px`)

The Mondriaan program writes the distributed matrix to a file called input-Px, where input is the name of the input matrix, or stdin if the matrix was read from the standard input, and x is the number of processors used in the distribution. We use an adapted Matrix Market format, with this structure:
%%MatrixMarket distributed-matrix coordinate real general m n nnz P Pstart[0] ( this should be 0 )
...
...
...
Pstart[P]( this should be nnz )
A.i[0] A.j[0] A.value[0] ...
...
...
A.i[nnz-1] A.j[nnz-1] A.value[nnz-1]
Here, Pstart[k] points to the start of the nonzeroes of processor k.

Processor indices (`-Ix`)

The Mondriaan program also writes the processor indices of each nonzero to the Matrix Market file input-Ix where the value of each nonzero is replaced by the processor index to which the nonzero has been assigned. The order of the nonzeroes is exactly that of the distributed matrix (-Px).

Row and column permutations (`-rowx`, `-colx`)

Mondriaan writes the row and column permutations determined by the Mondriaan algorithm (set by the Permute option) to input-rowx and input-colx. The goal of these permutations is to bring the input matrix A into (doubly) Separated Block Diagonal (SBD) or Bordered Block Diagonal (BBD) form, after applying the found permutations. This can have many possible applications, including, but not limited to, cache-oblivious sparse matrix vector multiplication (SBD form) or minimising fill-in in sparse LU decomposition (BBD form). These files are not written if the Permute option is set to none.

Reordered matrix (`-reor-Px`)

Mondriaan also directly writes the permuted matrix PAQ to file, where the permutation matrix P corresponds to the row permutation determined by the Mondriaan algorithm, as described in the previous paragraph. The permutation matrix Q is similarly inferred from the column permutation. This file is not written if the Permute option is set to none.

Separator boundary indices & hierarchy (`-rowblocksx`, `-colblocksx`)

These two files store two column-vectors each. The first column-vector relates to the separator boundary indices, the second to the separator hierarchy structure. The vectors are stored next to each other, so that each file has two columns of integers and m, resp, n rows. Each will be explained in turn.

Figure 1

Figure 2

The (doubly) separated block diagonal and bordered block diagonal forms each identify groups of nonzeroes, the separators, which serve as connectors between two parts resulting from a single bipartition. In the reordered matrix described above, these groups appear as relatively dense and usually small strips of consecutive rows and columns; the separator blocks. The indices where these blocks start and end in the reordered matrix are given in these two files; this is done by consecutively listing the start index and end index of each block, both separator and non-separator, as they occur in the row and column direction. These files are not written if the Permute option is set to none.

A more detailed explanation follows from the (idealised) Figures 1 and 2. The first shows a reordering corresponding to a single bipartition, clearly yielding four row and column indices indicating the start and end of the differently coloured partitions. This can occur recursively as shown in the second figure; there, each non-red partition is an instance of Figure 1, resulting in 16 row and column indices indicating the separators for the eight partitions. Assuming each non-separator block is two-by-two (x=2 in Figure 3), and the separator blocks are only one row and one column thick ( ~x=1), then the row and column boundary indices would equal (0,2,3,5,6,8,9,...,21,23). These two arrays are stored as a column vector in each file.

Figure 3

The second column in each file describes the separator hierarchy. As Mondriaan follows a bipartitioning scheme, the separator block from the very first bipartitioning corresponds to the separator blocks spanning the entire matrix. As bipartitioning is then called recursively, the separator blocks span only a subpart of the reordered matrix, and can be viewed as a child of the largest separator blocks; in this way a binary tree of separator blocks can be defined. See Figure 3 for clarification.

Each row of the rowblocks and columnblocks file corresponds to the start of a block (excluding the very last row which always equals m+1, resp., n+1, where m by n is the matrix size). Integers from 1 to m or 1 to n thus indicate blocks, and each separator block can point to its parent separator block by using these indices. This is exactly what is stored in the second column, where each non-separator block points to the separator block constructed in the same bipartitioning, and all separator blocks point to their direct parent separator block, as in Figure 3. The root separator, the one constructed in the very first bipartitioning, has no parent and therefore points to the non-existing index zero. Continuing the example, both the row and column hierarchies would be given by the array (2,4,2,8,6,4,6,0,10,12,10,8,14,12,14).

The rowblocks file in this instance would then look as follows (only partially shown):
0 2 2 4 3 2 5 8 ... 21 14 23

Input/output vector distributions (`-ux`, `-vx`)

The program writes the processor numbers of the vector components to the files called input-ux and input-vx, where input is the name of the input matrix and x is the number of processors used in the distribution. The vectors u and v are the output and input vectors of the sparse matrix-vector multiplication u=A*v.

In I/O files, all indices (i,j) for matrix entries a(i,j) and vector components u(i) and v(j) start numbering from 1, following the Matrix Market conventions. In I/O, the processors are numbered 1 to P. Internally, the indices are converted to the standard C-numbering starting from 0.

Cartesian submatrices (`-Cx`)

The program writes the row index sets I(q) and column index sets J(q) of the Cartesian submatrix I(q) x J(q) for the processors q=1,...,P to the file called input-Cx, where input is the name of the input matrix and x is the number of processors used in the distribution. This file is additional information, useful e.g. for visualisation, and you may not need it.

Statistics on standard output

Provided the library is compiled with the -DINFO or -DINFO2 option, the program prints plenty of useful statistics to standard output. The communication volume is given for the two phases of the matrix-vector multiplication separately: the volume for v (first communication phase) and the volume for u (second communication phase). The bottom line is the communication cost which is defined as the sum of the costs of the two phases. The cost of a phase is the maximum of the number of data words sent and the number of data words received, over all processors. This metric is also called the BSP cost; it is the cost metric of the Bulk Synchronous Parallel model.

Graphical output

In the tools/ subdirectory the program MondriaanPlot can be used to get insight in the Mondriaan partitioning algorithm. This program creates a series of Truevision TGA image files named img0000.tga, img0001.tga, ..., up to the number of processors over which the matrix is divided. To use this program, issue

% cd tools
% ./MondriaanPlot ../tests/arc130.mtx 8 0.03

If you have mencoder installed, you can use the MondriaanMovie script to generate a small movie of the partitioning process:

% ./MondriaanMovie ../tests/arc130.mtx 8 0.03

This will create ../tests/arc130.mtx.avi. If you have imagemagick installed (or if the convert command is otherwise available), you can use MondriaanGIF to create an animated GIF of the partitioning process:

% ./MondriaanGIF ../tests/arc130.mtx 8 0.03

This will create ../tests/arc130.mtx.gif.

When creating images with MondriaanPlot, also an SVG file is generated. Just as the .tga files, the .svg file also contains a visualisation of the partitioning. In the above examples, the svg file created will be ../tests/arc130.mtx-8.svg.

Program options

The Mondriaan options can be set in the Mondriaan.defaults file. If no such file exists, it is created at run time. Afterwards, this file can be edited for further runs. The default values of the options are given below in boldface. It is possible to overrule the defaults from the command line, e.g. by typing

% ./Mondriaan ../tests/arc130.mtx 8 0.03 -SplitStrategy=onedimrow

you can force Mondriaan to split the matrix in one dimension only, namely by rows.

Nonnumerical options

The nonnumerical options are used to choose partitioning methods. You may need to change them from the defaults to explore different partitioning methods.

SplitStrategy: alternate, localbest, localratio, onedimrow, onedimcol, finegrain, hybrid, symfinegrain, mediumgrain
The main choice of strategy. Alternate forces alternating splits in row and column direction; localbest tries both directions and chooses the best (this is also called the pure Mondriaan strategy); localratio tries to choose between the two, based on the aspect ratio of the current submatrix; onedimrow forces all splits to be in the row direction, and onedimcol in the column direction; and finegrain is a method developed in [5], which assigns individual nonzeroes to processors. Finegrain takes more computation time, but since it is the most general method it could in principle lead to the best solution. Hybrid [7] combines localbest and finegrain, by trying row, column, and finegrain splits and choosing the best. The mediumgrain method, introduced in [3], is a two-dimensional method that keeps parts of rows and columns together. Mediumgrain is as fast as localbest, but usually produces better results and therefore is the default option. The best strategies are localbest, mediumgrain, finegrain, and hybrid.
A new strategy is symmetric finegrain, given by the option symfinegrain. As the name implies, it is derived from the finegrain model and is able to exploit symmetry of input matrices. The number of vertices and nets (hyperedges) are both halved, compared to regular finegrain, resulting in faster partitioning. The quality of partitioning generally does decrease, however: symmetric finegrain uses the same net to model both rows and columns, and so if a row is cut, its corresponding column will also be cut. This is often not optimal, as either one of the row or column would have sufficed. Reordering, (reversed) Bordered Block Diagonal or Separated Block Diagonal, using symmetric finegrain, will always result in symmetric permutations. Structurally symmetric matrices can be handled as well, but requires some deftness from the user: only the lower triangular part (including diagonal) should be given to Mondriaan, and the resulting permutation should be manually applied to the original matrix. The terminal program (tools/Mondriaan) cannot perform this procedure automatically. The Matlab interface, in contrast, can. For symmetric permutations, see also the EnforceSymmetricPermutation option.
Partitioner: mondriaan, patoh
Permits the user to choose between the Mondriaan or PaToH hypergraph partitioner. Selecting patoh will use PaToH in a bipartitioning mode, effectively using it instead of the built-in hypergraph partitioner. This will also use the PaToH coarsening scheme instead of the one employed by Mondriaan.
Alternate_FirstDirection: row, col, ratio
How to start the alternating strategy.
LoadbalanceStrategy: constant, increase, decrease
Determines how to adjust the allowed imbalance epsilon for each split.
LoadbalanceAdjust: no, yes
Adjusting may change the number of processors assigned to each current part, to reflect the nonzero loads better.
SplitMethod: simple, KLFM
Simple is just meant for debugging, since it bypasses the sophisticated multilevel partitioning. It just partitions the nonzeroes into two nearly equal sets (in case of two processors) without looking at the communication costs incurred.
Metric: lambda1, cutnet, lambdalambda1
Determines the metric with respect to which the communication volume is being minimised: the Lambda-minus-one, cut-net, and Lambda-times-Lambda-minus-one [4] metrics are available. For the Lambda-times-Lambda-minus-one metric, the PaToH bipartitioner must be used. The Lambda-minus-one metric is the default, since it precisely measures the communication volume of a parallel sparse matrix-vector multiplication. It is also called the connectivity metric.
DiscardFreeNets: yes, no
Discard nets (and vertices) of the hypergraph that have no influence on the communication volume (i.e. when they contain at most one nonzero, or when they are cut and Metric is cutnet). This option should be enabled whenever the cut-net metric is used.
ZeroVolumeSearch: yes, no
Before applying the simple or KLFM split method, first check whether a split is possible with zero communication. Whenever such a zero volume split is possible, this method is more likely to find it than the other methods, while at the same time being faster than most other methods.
ImproveFreeNonzeros: yes, no
After a split is performed, improve the load balance by moving free nonzeros between parts. In this context, free nonzeros are nonzeros that may be moved between parts without increasing the communication volume.
CheckUpperBound: yes, no
After finishing, check whether the computed partitioning has a communication volume of at most (min(m,n)+1)(P-1). If not, this option makes sure an alternative method is used that does produce a partitioning with volume at most (min(m,n)+1)(P-1). This option will only take effect if UseSingleEntry, dummies and column weights are not used, and if two-dimensional partitionings are allowed.
SquareMatrix_DistributeVectorsEqual: no, yes
If yes, force the distribution of the vectors u and v to be the same.
SquareMatrix_DistributeVectorsEqual_AddDummies: no, yes
If yes, and the vectors must be distributed the same, add dummy nonzeroes for diagonal elements a(i,i) = 0.
SymmetricMatrix_UseSingleEntry: no, yes
If yes, feed the lower triangular part of a symmetric matrix to Mondriaan, partition it, and then assign a(i,j) for i < j to the same processor as a(j,i). If the SplitStrategy is symfinegrain, this option must be set to yes.
SymmetricMatrix_SingleEntryType: lower, random
If random, and if the matrix is symmetric and a single entry is used, then choose either a(i,j) or a(j,i) randomly to be fed into Mondriaan. If the SplitStrategy is symfinegrain, this option must be set to lower.
Coarsening_MatchingStrategy: random, inproduct, ata
Random causes matching of a random neighbouring column when merging columns in the multilevel coarsening. Inproduct takes the neighbouring column with the highest inner product. This works better, but is slower. ATA combines a specific graph matching algorithm with a specific neighbor finding algorithm in the hypergraph to generate matchings (see Coarsening_MatchingATAMatcher and Coarsening_MatchingATAFinder).
Coarsening_MatchingATAMatcher: greedy, pga
If Coarsening_MatchingStrategy is set to ata, this is the used graph matching algorithm. Greedy is the standard Mondriaan algorithm which greedily matches vertices in-order to their most heavy unmatched neighbors. PGA is the PGA' (1/2)-approximation algorithm developed by Drake and Hougardy, which offers higher quality matchings.
Coarsening_MatchingATAFinder: inproduct, stairway
If Coarsening_MatchingStrategy is set to ata, this is the used hypergraph neighbor finding algorithm. Inproduct always yields the neighbors of a given vertex with exact inner products. Stairway provides an efficient heuristic to approximate these inner products. It is much faster for matrices that have dense rows.
Coarsening_InprodMatchingOrder: decrwgt, incrwgt, decrdeg, incrdeg, natural, random
Determines the order in which columns are visited in the matching. This is either by decreasing column weight, increasing column weight, increasing column degree, decreasing column degree, the natural given order, or a random order. The column weight represents the total number of nonzeroes merged into the column, whereas the degree represents the current sparsity pattern.
Coarsening_NetScaling: no, linear
Linear scaling gives each overlapping nonzero in the inner product matching a weight inversely proportional to the number of nonzeroes present in its row.
Coarsening_InprodScaling: no, cos, min, max, jaccard
Minimum scaling scales the inner product IP for the match between columns j0 and j1 by a factor 1/min(deg(j0, j1)). Maximum scaling uses 1/max(deg(j0, j1)). Cosine scaling uses 1/sqrt(min*max), which represents the cosine of the angle between the corresponding vectors. Jaccard uses 1/(min+max-IP), and is a metric often used in information retrieval.
Coarsening_MatchIdenticalFirst: no, yes
If yes, try to match identical columns first, before looking at inner products.
VectorPartition_Step3: increase, decrease, random
Remainder from original Mondriaan vector distribution. Determines the order in which matrix columns with at least 3 processors owning nonzeroes therein, are visited in Step 3 of the vector distribution algorithm (Algorithm 2 in the paper by Vastenhouw and Bisseling, 2005).
Permute: none, reverseBBD, SBD, BBD
Generate row and column permutations of the matrix that is being processed, reflecting the partitioning process. The SBD or (reverse)BBD tags determine where the rows or columns are permuted if they contain nonzeroes assigned to two different parts after a single bipartition. The Separated Block Diagonal (SBD) form will permute those rows and columns such that they appear in between the pure blocks; that is, the blocks of rows and columns which contain only those nonzeroes corresponding to a single part of the bipartition.
Bordered Block Diagonal (BBD) form will permute these separator blocks after the two pure blocks. Finally, the reverseBBD form does the exact opposite, namely permuting separator blocks in front of the pure blocks. The BBD reordering has applications in sparse LU (reduction of fill-in), whereas the SBD permutation is useful in cache-oblivious sparse matrix-vector multiplication; see e.g. [1], [2]. Many other applications are expected to appear.
EnforceSymmetricPermutation: no, yes
Controls whether symmetric row and column permutations are generated; that is, P=Q^T, where the reordered matrix is given by PAQ with A the input matrix. Note that A is permuted to PAQ only when the Permute option is set to something other than none. There are no restrictions on A: in particular, it need not be (structurally) symmetric. When A is symmetric, however, this option ensures PAQ also is symmetric.
Much as with the symmetric finegrain option discussed in the section on the SplitStrategy option, the quality of the separators degrades when symmetry is forced: the separators will in most cases be wider than optimally required. E.g., the only way an SBD permutation in one dimension (row-net, for instance) can be symmetric, is to introduce a vertical separator block in addition to the horizontal separator already there.
Iterative_Refinement: never, aftermg, always
When to perform iterative refinement to improve a partitioning (see [3]). Iterative refinement slightly increases partitioning time, but leads to better results. If set to never, iterative refinement is not performed. If set to aftermg, it is only performed if SplitStrategy is set to mediumgrain. If set to always, iterative refinement is always performed after partitioning, which may yield two-dimensional partitionings even if a one-dimensional strategy is used.

Output options

OutputFormat: original, emm
Controls the output format of matrix and vector files. Original mode writes the files exactly as described in the previous section. The emm mode refers to the Extended Matrix-Market (EMM) format; this format is described here. It formally extends the Matrix-Market sparse matrix storage scheme to be able to handle distributed matrices and distributed vectors. The main difference with Mondriaan's original mode is that the array controlling which nonzeroes correspond to which processor is now 1-based, as are most other values in the Matrix-Market format. Also, all files generated by Mondriaan are now preceded by an EMM banner; for example, previously, vector files would not possess such a header.
OutputMode: original, onefile, DIMACS
Controls how file output is generated. Original mode causes Mondriaan to write a new file for each different object (distributed matrix, Cartesian matrix, vector distributions, row-permutation, column permutation, et cetera). See also the previous section on output files. The new Extended Matrix-Market format, however, has support to store all these data in a single file instead of many; to enable this, select onefile as the OutputMode. This does require the OutputFormat to be set to EMM. See the pdf file on the EMM format for specifics on how all data is stored in a single file. The DIMACS output format will generate partitioning output file appropriate for the 10th DIMACS challenge on graph partitioning and clustering.

Numerical options

The numerical options are often used to optimise given partitioning methods. Setting values such as NrRestarts, MaxNrLoops, or MaxNrNoGainMoves to a higher number, often results in better quality of the partitioning solution, at the expense of increased run-time.

Seed: 99
Integer. Range >= 0. Set the random seed. You can also set Seed=random to set the seed depending on your system time. This can only be done on a UNIX system, compiled with the flag -DUNIX. This is useful, for instance, if you want to obtain an average communication volume over 10 runs, each time with a different seed.
Coarsening_NrVertices: 200
Integer. Range >= 1. Recommended range : 100-500. Determines when to stop coarsening, as the current number of vertices is small enough.
Coarsening_MaxCoarsenings: 128
The maximum number of graph coarsenings that may be performed by Mondriaan.
Coarsening_NrMatchArbitrary: 0
The number of matching levels that should be performed arbitrarily during coarsening (i.e. irrespective of the actual inner products between vertices). Set to 0 to disable.
Coarsening_MaxNrVtxInMatch: 2
Integer. Range >= 2. Recommended range : 2-10. The default value of 2 represents pairwise matching. When using PGA as ATA matcher, you can also set MaxNrVtxInMatch=log or -1, in which case MaxNrVtxInMatch will equal either 2, 3 or 4, given by the formula MaxNrVtxInMatch = floor(log2(floor(NrVerticesInGraph / Coarsening_NrVertices))), clamped between 2 and 4.
Coarsening_StopRatio: 0.05
Float. Range = [0,1]. The contraction ratio is defined as: [NrVtx(old)-NrVtx(new)] / NrVtx(old). Stop coarsening if the ratio drops below the stopping value.
Coarsening_VtxMaxFractionOfWeight: 0.2
Float. Range = (0,1]. To ensure load balance : fraction < 0.5. This parameter is set to prevent matching all vertices into one huge vertex.
Coarsening_FineSwitchLevel: 2
If a finegrain split is performed (either in the finegrain, symfinegrain, or hybrid strategy) we use a specialised, faster matching function for the first 2 splitting levels of the graph coarsening to improve performance. This value can be set to 0 to disable this optimisation.
KLFM_InitPart_NrRestarts: 8
Integer. Range >= 1. Number of times the Kernighan-Lin Fiduccia-Mathheyses algorithm is run, each time with a different initial partitioning.
KLFM_InitPart_MaxNrLoops: 25
Integer. Range >= 1. Maximum number of loops within one run.
KLFM_InitPart_MaxNrNoGainMoves: 200
Integer. Range >= 0. Maximum number of successive no-gain moves allowed in one loop of a KLFM run.
KLFM_Refine_MaxNrLoops: 25
Integer. Range >= 1. Maximum number of loops within the refinement run of KLFM.
KLFM_Refine_MaxNrNoGainMoves: 200
Integer. Range >= 0. Maximum number of successive no-gain moves allowed in one loop of the refinement run of KLFM.
VectorPartition_MaxNrLoops: 10
Integer. Range >= 1. Number of times a vector partitioning is tried. Each time, the matrix columns are randomly reordered on input. Vector partitioning is much cheaper than matrix partitioning, so trying this several times is justified.
VectorPartition_MaxNrGreedyImproves: 10
Integer. Range >= 0. Each vector partitioning can be improved by a very cheap greedy improvement procedure (described in [6]). MaxNrGreedyImproves is the number of times this is done for each vector partitioning.

How to use Mondriaan as a library

After successful compilation (using make), all relevant header files are exported to the src/include directory, and a static library is available in the src/lib directory.

To illustrate the use of Mondriaan we provide a small example program together with this guide. This example can be compiled (provided the Mondriaan library was successfully built) by executing

% gcc -DGAINBUCKET_ARRAY example.c -I../src/include -L../src/lib -lMondriaan4 -lm

in the docs/ directory which will generate the executable a.out. More advanced use of the library is illustrated in tools/Mondriaan.c.

Mondriaan uses a triplet-based datastructure (the Matrix Market format) to store sparse matrices; that is, for each nonzero, its row- and column-index are stored, as well as its numerical value (in general). The exact representation is detailed in src/SparseMatrix.c/.h; and to successfully interface, your sparse matrix scheme has to be translated into this format.

Since the Compressed Row Storage (CRS), or alternatively known as Compressed Sparse Row (CSR), is a more prevailing standard, SparseMatrix comes with a translation function specifically for that datastructure: CRSSparseMatrixInit. It takes as input an uninitialised Mondriaan SparseMatrix struct, the matrix dimensions and number of nonzeroes, and the CRS datastructure arrays. The base parameter automatically translates from, e.g., 1-based arrays (base=1) as they may appear in for example Fortran, to 0-based arrays as used in Mondriaan.

Next is setting any options for Mondriaan. Recommended is to use an options file, storing all the defaults tuned to your application (e.g. as provided by tools/Mondriaan.defaults). These defaults can be read from file and stored in the Mondriaan Options struct by using the function SetOptionsFromFile; see src/Options.c/.h.

Once both the sparse matrix and the options are ready the main Mondriaan distribution function, DistributeMatrixMondriaan, can be used. This function takes as parameters the sparse matrix struct, the number of processors, the load imbalance parameter, and the options struct. The callback function is for advanced use, and is usually kept a NULL pointer (an example of using the callback is given in tools/MondriaanPlot.c). This function is a blocking call. After partitioning, all relevant information can be extracted from the SparseMatrix struct (see the source code for details). The struct can be freed by calling MMDeleteSparseMatrix.

Interfacing with non-C code is best achieved by writing a custom interface between your code and Mondriaan, using extern "C"-style additions when, e.g., interfacing between C++ and C; or using any other specific translation required (e.g., writing all parameters as pointers (Fortran), using jni (Java), etc.). As an example, the MATLAB interface is given in tools/MatlabMondriaan.c.

Profiling Mondriaan

To profile the Mondriaan library for various configurations, please use the Profile program and RunProfiler script both included in the tools/ directory. Entering the tools/ directory and executing

% ./Profile 8 0.03 10 a.mtx b.mtx c.mtx > out.tex

partitions the matrices a.mtx, b.mtx, and c.mtx among 8 processors with at most 3% imbalance, taking the average over 10 runs with a different random seed. The results are written, via stdout, to the LaTeX file out.tex which then contains a table with the averaged results of the partitioning process.

For code developers

If you develop new code to add to Mondriaan, you probably would like to add possibilities to use your code through a new option. In that case you need to adjust a few files:

src/Options.h: you should add the option and its possible values.
src/Options.c: you should make a choice for the default value and set it in the function SetDefaultOptions. If the option is a numerical option, you should check its range in the function SetDefaultOptions. Finally, you should add the option and all its possible values as an if-statement in the function SetOption.
subdirectory tests: it may happen that adding the options will break a few unit tests, in particular those connected to functions GetParameters, SetDefaultOptions, SetOption. This is quite harmless, and easy to repair.

More information

All functions of Mondriaan have extensive documentation in the source code (.c files). Please have a look there for more details.

References

[1] Cache-oblivious sparse matrix-vector multiplication by using sparse matrix partitioning methods, A. N. Yzelman and Rob H. Bisseling, SIAM Journal of Scientific Computation, Vol. 31, Issue 4, pp. 3128-3154 (2009).
[2] Two-dimensional cache-oblivious sparse matrix-vector multiplication, A. N. Yzelman and Rob H. Bisseling, Parallel Computing, Vol. 37, Issue 12, pp. 806-819 (2011).
[3] A medium-grain method for fast 2D bipartitioning of sparse matrices, Daniël M. Pelt and Rob H. Bisseling, Proceedings IEEE International Parallel and Distributed Processing Symposium 2014, IEEE Press, pp. 529-539.
[4] A new metric enabling an exact hypergraph model for the communication volume in distributed-memory parallel applications O. Fortmeier, H. M. Bücker, B. O. Fagginger Auer, R. H. Bisseling, Parallel Computing, Vol. 39, Issue 8, pp. 319-335 (2013).
[5] A Fine-Grain Hypergraph Model for 2D Decomposition of Sparse Matrices Ümit V. Çatalyürek and Cevdet Aykanat, IPDPS 2001, p. 118 (2001).
[6] Communication balancing in parallel sparse matrix-vector multiplication Rob H. Bisseling and Wouter Meesen, ETNA, pp. 47-65 (2005).
[7] Two-Dimensional Approaches to Sparse Matrix Partitioning Rob H. Bisseling, Bas O. Fagginger Auer, A. N. Yzelman, Tristan van Leeuwen and Ümit V. Çatalyürek, Chapter 12 of Combinatorial Scientific Computing, Chapman and Hall/CRC, pp. 321-349 (2012).

Last updated: December 18, 2020.

June 9, 2009 by Rob Bisseling,
December 2, 2010 by Bas Fagginger Auer,
December 10, 2010 by A. N. Yzelman,
March 27, 2012 by Bas Fagginger Auer,
April 18, 2013 by Daniël M. Pelt,
August 29, 2013 by Rob Bisseling and Bas Fagginger Auer,
October 27, 2016 by Marco van Oort,
September 7, 2017 by Marco van Oort.

To the Mondriaan package home page.