% gunzip mondriaan2.01.tar.gz
% tar xvf mondriaan2.01.tar
Important files are:
% make
This will compile all files needed for the program Mondriaan itself and create the program Mondriaan. Type
% make test
to compile and run all 95 unit tests. For each function of Mondriaan, we have written a small (undocumented) test, which will check the function for proper behaviour. The unit tests have been tried successfully on three architectures: Linux, Mac Os X, Solaris. If one or more of the tests fails, please try to identify the relevant error message, and inform me (R . H . Bisseling @ NOSPAM uu . nl ) about the problem. I'll try to help you solve it. The tests are in a script runtest.
Mondriaan writes the output distributions to file. It can also generate useful statistics about the partitioning to the standard output stream stdout (or the screen). There are three levels of verbosity: silent, standard, and verbose. The silent mode is useful when running the unit tests by make test. (If these are not done silently, the OKs are obscured by clutter.) The silent mode may also be useful if (parts of) Mondriaan are used as library functions. The verbose mode is useful when debugging, or trying to understand a particular run in detail. The standard mode generates 1-2 pages of output, and is aimed at easy digestion. You can change the verbosity level by commenting and uncommenting the appropriate CFLAGS lines in the Makefile. Using the flag -DINFO generates standard output, and using the flags -DINFO -DINFO2 generates verbose output.
% Mondriaan gemat11.mtx 8 0.03
The program writes the distributed matrix to a file called
"input-Px",
where input is the name of the input matrix and x is the number of processors
used in the distribution.
We use an adapted Matrix Market format, with this structure:
%%MatrixMarket distributed-matrix coordinate real general
m n nnz P
Pstart[0] ( this should be 0 )
...
...
...
Pstart[P]( this should be nnz )
A.i[0] A.j[0] A.value[0]
...
...
...
A.i[nnz-1] A.j[nnz-1] A.value[nnz-1]
The program writes the processor numbers of the vector components to the files called "input-ux" and "input-vx, where input is the name of the input matrix and x is the number of processors used in the distribution. The vectors u and v are the output and input vectors of the sparse matrix-vector multiplication u=A*v.
In I/O files, all indices (i,j) for matrix entries a(i,j) and vector components u(i) and v(j) start numbering from 1, following the Matrix Market conventions. In I/O, the processors are numbered 1 to P. Internally, the indices are converted to the standard C-numbering starting from 0.
The program writes the row index sets I(q) and column index sets J(q) of the Cartesian submatrix I(q) x J(q) for the processors q=1,...,P to the file called "input-Cx", where input is the name of the input matrix and x is the number of processors used in the distribution. This file is additional information, useful e.g. for visualisation, and you may not need it.
The program prints plenty of useful statistics to standard output. The communication volume is given for the two phases of the matrix-vector multiplication separately: the volume for v (first communication phase) and the volume for u (second communication phase). The bottom line is the communication cost which is defined as the sum of the costs of the two phases. The cost of a phase is the maximum of the number of data words sent and the number of data words received, over all processors. This metric is also called the BSP cost; it is the cost metric of the Bulk Synchronous Parallel model.
The compile option -DTIME causes the cpu time used by Mondriaan
for the matrix distribution to be printed, and also the time for the vector
distribution. This timer has an accuracy of at least 0.01 seconds;
it is in danger of clock wraparound.
% Mondriaan gemat11.mtx 8 0.03 -SplitStrategy=onedimrowProgram options
The program has the following options. These can be set in the
Mondriaan.defaults file. If no such file exists, it is created
at run time. Afterwards, this file can be edited for further runs.
The default values of the options are given below in boldface.
It is possible to overrule the defaults from the command line,
e.g. by typing
you can force Mondriaan to split the matrix in one dimension only,
namely by rows.
Nonnumerical options
The nonnumerical options are used to choose partitioning methods.
You may need to change them from the defaults to explore
different partitioning methods.
Determines how to adjust the allowed imbalance epsilon for each split.
Adjusting may change the number of processors assigned to each
current part, to reflect the nonzero loads better.
The main choice of strategy. Alternate forces alternating splits in row
and column direction; localbest tries both directions and
chooses the best (this is also called the pure Mondriaan strategy);
localratio tries to choose between the two, based on the aspect ratio
of the current submatrix; onedimrow forces all splits to be in the row direction,
and onedimcol in the column direction; finegrain is a method developed
by Catalyurek and Aykanat in 2001 which assigns individual nonzeros
to processors. Finegrain takes more computation time, but since it is
the most general method it could in principle lead to the best solution.
Hybrid combines localbest and finegrain, by trying row, column, and finegrain splits
and choosing the best. The default is localbest because of its computing speed.
The best strategies are localbest, finegrain, hybrid.
How to start the alternating strategy.
Simple is just meant for debugging, since it bypasses the sophisticated
multilevel partitioning. It just partitions the nonzeros into two nearly equal sets
(in case of two processors) without looking at the communication costs incurred.
If yes, force the distribution of the vectors u and v to be the same.
If yes, and the vectors must be distributed the same,
add dummy nonzeros for diagonal elements a(i,j) =0.
If yes, feed the lower triangular part of a symmetric matrix to Mondriaan,
partition it, and then assign a(i,j) for i < j to the same processor as a(j,i).
If random, and if the matrix is symmetric and a single entry is used,
then choose either a(i,j) or a(j,i) randomly to be fed into Mondriaan.
Random causes matching of a random neighbouring column when merging
columns in the multilevel coarsening. Inproduct takes the neighbouring column
with the highest inner product. This works better, but is slower.
If a finegrain split is performed (either in the finegrain or hybrid strategy),
we can use the standard inner product matching (ip),
or a specialised version (ipfine) which exploits the properties
of the finegrain hypergraph. Ipfine is faster.
Determines the order in which columns are visited in the matching.
This is either by decreasing column weight, increasing column weight,
increasing column degree,
decreasing column degree, the natural given order, or a random order.
The column weight represents the total number of nonzeros merged into the column,
whereas the degree represents the current sparsity pattern.
Linear scaling gives each overlapping nonzero in the inner product matching
a weight inversely proportional to the number of nonzeros present in its row.
Minimum scaling scales the inner product IP for the match between columns j0 and j1
by a factor 1/min(deg(j0, j1)). Maximum scaling uses 1/max(deg(j0, j1)).
Cosine scaling uses 1/sqrt(min*max), which represents the cosine of the angle
between the corresponding vectors.
Jaccard uses 1/(min+max-IP), and is a metric often used in information retrieval.
If yes, try to match identical columns first, before looking at inner products.
Remainder from original Mondriaan vector distribution.
Determines the order in which matrix columns with at least 3 nonzeros are visited
in Step 3 of the vector distribution algorithm (Algorithm 2 in the paper
by Vastenhouw and Bisseling, 2005).
Numerical options
The numerical options are often used to optimise given partitioning methods.
Setting values such as NrRestarts,
MaxNrLoops, or
MaxNrNoGainMoves to a higher number, often results in better quality
of the partitioning solution, at the expense of increased runtime.
Integer. Range >= 0. Set the random seed.
You can also set Seed=random to set the seed depending on your system time.
This can only be done on a UNIX system, compiled with the flag -DUNIX.
This is useful, for instance, if you want to obtain an average
communication volume over 10 runs, each time
with a different seed.
Integer. Range >= 1. Recommended range : 100-500.
Determines when to stop coarsening, as the current number
of vertices is small enough.
Integer. Range >= 2. Recommended range : 2-10.
The default value of 2 represents pairwise matching.
Float. Range = [0,1].
The contraction ratio is defined as: [NrVtx(old)-NrVtx(new)] / NrVtx(old).
Stop coarsening if the ratio drops below the stopping value.
Float. Range = (0,1]. To ensure load balance : fraction < 0.5.
This parameter is set to prevent matching all vertices into one huge vertex.
Integer. Range >= 1.
Number of times the Kernighan-Lin Fiduccia-Mathheyses algorithm is run,
each time with a different initial partitioning.
Integer. Range >= 1.
Maximum number of loops within one run.
Integer. Range >= 0.
Maximum number of successive no-gain moves allowed in one loop of a KLFM run.
Integer. Range >= 1.
Maximum number of loops within the refinement run of KLFM.
Integer. Range >= 0.
Maximum number of successive no-gain moves allowed in one loop of
the refinement run of KLFM.
Integer. Range >= 1.
Number of times a vector partitioning is tried.
Each time, the matrix columns are randomly reordered on input.
Vector partitioning is much cheaper than matrix partitioning,
so trying this several times is justified.
Integer. Range >= 0.
Each vector partitioning can be improved by a very cheap
greedy improvement procedure (described in Bisseling and Meesen 2005).
MaxNrGreedyImproves is the number of times this is done
for each vector partitioning.
For code developers
If you develop new code, you probably would like to add possibilities
to use your code through a new option. In that case you need to adjust a few files:
More information
All functions of Mondriaan have extensive documentation in the source code.
Please have a look there for more details.
Last updated June 9, 2009 by Rob Bisseling.
to
Home page Mondriaan package.