Summary of the content on the page No. 1
Scotch and libScotch 5.1 User’s Guide
(version 5.1.10)
Franc¸ois Pellegrini
Bacchus team, INRIA Bordeaux Sud-Ouest
ENSEIRB & LaBRI, UMR CNRS 5800
Universit´e Bordeaux I
351 cours de la Lib´eration, 33405 TALENCE, FRANCE
pelegrin@labri.fr
August 29, 2010
Abstract
This document describes the capabilities and operations of Scotch and
libScotch, a software package and a software library devoted to static
mapping, partitioning, and sparse matrix block ordering of graphs and
meshes/hypergraphs. It giv
Summary of the content on the page No. 2
Contents 1 Introduction 6 1.1 Static mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2 Sparse matrix ordering . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Contents of this document . . . . . . . . . . . . . . . . . . . . . . . . 7 2 The Scotch project 7 2.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3 Algorithms 8 3.1 Static mapping by Dual Re
Summary of the content on the page No. 3
6.3.5 gcv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 6.3.6 gmap / gpart . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 6.3.7 gmk * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 6.3.8 gmk msh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 6.3.9 gmtst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 6.3.10 gord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 6.3.11 gotst . . . . . . . . . . . .
Summary of the content on the page No. 4
7.5.3 SCOTCH graphFree . . . . . . . . . . . . . . . . . . . . . . . . 77 7.5.4 SCOTCH graphLoad . . . . . . . . . . . . . . . . . . . . . . . . 78 7.5.5 SCOTCH graphSave . . . . . . . . . . . . . . . . . . . . . . . . 79 7.5.6 SCOTCH graphBuild . . . . . . . . . . . . . . . . . . . . . . . 79 7.5.7 SCOTCH graphBase . . . . . . . . . . . . . . . . . . . . . . . . 80 7.5.8 SCOTCH graphCheck . . . . . . . . . . . . . . . . . . . . . . . 81 7.5.9 SCOTCH graphSize . . . . . . . . . . . . . . . . . .
Summary of the content on the page No. 5
7.10.2 SCOTCH stratExit . . . . . . . . . . . . . . . . . . . . . . . . 108 7.10.3 SCOTCH stratSave . . . . . . . . . . . . . . . . . . . . . . . . 109 7.10.4 SCOTCH stratGraphBipart . . . . . . . . . . . . . . . . . . . 109 7.10.5 SCOTCH stratGraphMap . . . . . . . . . . . . . . . . . . . . . 110 7.10.6 SCOTCH stratGraphMapBuild . . . . . . . . . . . . . . . . . . 110 7.10.7 SCOTCH stratGraphOrder . . . . . . . . . . . . . . . . . . . . 111 7.10.8 SCOTCH stratGraphOrderBuild . . . . . . . . . .
Summary of the content on the page No. 6
1 Introduction 1.1 Static mapping The efficient execution of a parallel program on a parallel machine requires that the communicating processes of the program be assigned to the processors of the machine so as to minimize its overall running time. When processes have a lim- ited duration and their logical dependencies are accounted for, this optimization problem is referred to as scheduling. When processes are assumed to coexist simul- taneously for the entire duration of the program, it is referr
Summary of the content on the page No. 7
1.3 Contents of this document This document describes the capabilities and operations of Scotch, a software packagedevotedtostatic mapping, graphandmeshpartitioning, andsparsematrix block ordering. Scotch allows the user to map efficiently any kind of weighted process graph onto any kind of weighted architecture graph, and provides high- quality block orderings of sparse matrices. The rest of this manual is organized as follows. Section 2 presents the goals of the Scotch project, and section 3 out
Summary of the content on the page No. 8
The free/libre software license under which Scotch 5.1 is distributed is the CeCILL-C license [6], which has basically the same features as the GNU LGPL (“Lesser General Public License”): ability to link the code as a library to any free/libre or even proprietary software, ability to modify the code and to redistribute these modifications. Version 4.0 of Scotch was distributed under the LGPL itself. Please refer to section 8 to see how to obtain the free/libre distribution of Scotch. 3 Algorithms
Summary of the content on the page No. 9
This function, which has already been considered by several authors for hyper- cube target topologies [11, 21, 25], has several interesting properties: it is easy to compute, allows incremental updates performed by iterative algorithms, and its minimization favors the mapping of intensively intercommunicating processes onto nearby processors; regardless of the type of routage implemented on the target machine (store-and-forward or cut-through), it models the traffic on the intercon- nection networ
Summary of the content on the page No. 10
callsagraph bipartitioning algorithm tosplitthesubsetofprocessesassociatedwith the domain across the two subdomains, as sketched in the following. mapping (D, P) Set_Of_Processors D; Set_Of_Processes P; { Set_Of_Processors D0, D1; Set_Of_Processes P0, P1; if (|P| == 0) return; /* If nothing to do. */ if (|D| == 1) { /* If one processor in D */ result (D, P); /* P is mapped onto it. */ return; } (D0, D1) = processor_bipartition (D); (P0, P1) = process_bipartition (P, D0, D1); mapping (D0, P0); /*
Summary of the content on the page No. 11
3.1.4 Partial cost function The production of efficient complete mappings requires that all graph bipartition- ings favor the criteria that we have chosen. Therefore, the bipartitioning of a ′ subgraph S of S should maintain load balance within the user-specified tolerance, ′ and minimize the partial communication cost function f , defined as C X def ′ ′ ′ f (τ ,ρ ) = w ({v,v})|ρ ({v,v})| , S,T S,T S S,T C ′ v ∈ V(S ) ′ {v,v } ∈E(S) ′ which accounts for the dilation of edges internal to subgraph S a
Summary of the content on the page No. 12
neighbor processes, whether they are handled by the job itself or not, since it can ′ estimateinf thedilationofthecorrespondingedges. Thisresultsinaninteresting C feedback effect: once an edge has been kept in a cut between two subdomains, the distancebetweenits endverticeswill beaccountedforinthe partialcommunication cost function to be minimized, and following jobs will be more likely to keep these vertices close to each other, as illustrated in Figure 2. The relative efficiency of CL0 CL0 CL2 CL
Summary of the content on the page No. 13
space that derives from the one which was globally defined at the coarsest level, thus preventing local optimization refinement algorithms to be trapped in local optima of the finer graphs [8]. Diffusion This global optimization method, presented in [42], flows two kinds of antag- onistic liquids, scotch and anti-scotch, from two source vertices, and sets the new frontier as the limit between vertices which contain scotch and the ones which contain anti-scotch. In order to add load-balancing constrai
Summary of the content on the page No. 14
tree rooted at a randomly chosen vertex, and this process is iterated by se- lecting a new root vertex within the last layer as long as the number of layers increases. Then, starting from the current root vertex, vertices are assigned layerafter layerto the first subdomain, until half of the total weight has been processed. Remaining vertices are then allocated to the second subdomain. AsfortheoriginalGibbs, Poole,andStockmeyeralgorithm,itisassumedthat the maximization of the number of layers res
Summary of the content on the page No. 15
for it to account for topological structures of the graph that would else be of a too high level for it to encompass in its local optimization process. 3.1.7 Mapping onto variable-sized architectures Several constrained graph partitioning problems can be modeled as mapping the problem graph onto a target architecture, the number of vertices and topology of which depend dynamically on the structure of the subgraphs to bipartition at each step. Variable-sized architectures are supported by the DRB
Summary of the content on the page No. 16
vertex separators are computed by using direct heuristics [28, 37], or from edge separators [48, and included references] by minimum cover techniques [9, 30], but other techniques such as spectral vertex partitioning have also been used [49]. Providedthat good vertex separatorsare found, the nested dissectionalgorithm produces orderings which, both in terms of fill-in and operation count, compare favorably[19, 31, 46] to the ones obtained with the minimum degreealgorithm[39]. Moreover, the elimin
Summary of the content on the page No. 17
measure its quality, severalparameters canbe defined: h , h , and h denote min max avg 1 the minimum, maximum, and average heights of the tree , respectively, and h dlt is the variance, expressed as a percentage of h . Since small separators result in avg small chains in the elimination tree, h should also indirectly reflect the quality avg of separators. 3.2.5 Ordering methods The core of our ordering algorithm uses graph ordering methods as black boxes, whichallowstheorderertorunanytypeoforderin
Summary of the content on the page No. 18
3.2.6 Graph separation methods The core of our incomplete nested dissection algorithm uses graph separation methods as black boxes. It allows the orderer to run any type of graph separation method compatible with our criteria for quality, that is, reducing the size of the vertex separator while maintaining the loads of the separated parts within some user-specified tolerance. Separation jobs maintain an internal image of the current vertex separator, indicating for every vertex of the job whether
Summary of the content on the page No. 19
Scotch can now handle compressed streams on the fly, in several widely used formats such as gzip, bzip2 or lzma. Please refer to Section 6.2 for more informa- tion. 4.2 Changes from version 5.0 Anew integerindex type hasbeencreatedinthe Fortraninterface, toaddressarray indices larger than the maximum value which can be stored in a regular integer. Please refer to Section 8.3 for more information. 5 Files and data structures For the sake of portability, readability, and reduction of storage space,
Summary of the content on the page No. 20
The numeric flag, similar to the one used by the Chaco graph format [24], is made of three decimal digits. A non-zero value in the units indicates that vertex weights are provided. A non-zero value in the tenths indicates that edge weights are provided. A non-zero value in the hundredths indicates that vertex labels are provided; if itis the case, verticescanbe storedinanyorderinthe file; else, natural order is assumed, starting from the graph base index. This header data is then followed by as ma