ページ1に含まれる内容の要旨
Scotch and libScotch 5.1 User’s Guide
(version 5.1.10)
Franc¸ois Pellegrini
Bacchus team, INRIA Bordeaux Sud-Ouest
ENSEIRB & LaBRI, UMR CNRS 5800
Universit´e Bordeaux I
351 cours de la Lib´eration, 33405 TALENCE, FRANCE
pelegrin@labri.fr
August 29, 2010
Abstract
This document describes the capabilities and operations of Scotch and
libScotch, a software package and a software library devoted to static
mapping, partitioning, and sparse matrix block ordering of graphs and
meshes/hypergraphs. It giv
ページ2に含まれる内容の要旨
Contents 1 Introduction 6 1.1 Static mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2 Sparse matrix ordering . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Contents of this document . . . . . . . . . . . . . . . . . . . . . . . . 7 2 The Scotch project 7 2.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3 Algorithms 8 3.1 Static mapping by Dual Re
ページ3に含まれる内容の要旨
6.3.5 gcv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 6.3.6 gmap / gpart . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 6.3.7 gmk * . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 6.3.8 gmk msh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 6.3.9 gmtst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 6.3.10 gord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 6.3.11 gotst . . . . . . . . . . . .
ページ4に含まれる内容の要旨
7.5.3 SCOTCH graphFree . . . . . . . . . . . . . . . . . . . . . . . . 77 7.5.4 SCOTCH graphLoad . . . . . . . . . . . . . . . . . . . . . . . . 78 7.5.5 SCOTCH graphSave . . . . . . . . . . . . . . . . . . . . . . . . 79 7.5.6 SCOTCH graphBuild . . . . . . . . . . . . . . . . . . . . . . . 79 7.5.7 SCOTCH graphBase . . . . . . . . . . . . . . . . . . . . . . . . 80 7.5.8 SCOTCH graphCheck . . . . . . . . . . . . . . . . . . . . . . . 81 7.5.9 SCOTCH graphSize . . . . . . . . . . . . . . . . . .
ページ5に含まれる内容の要旨
7.10.2 SCOTCH stratExit . . . . . . . . . . . . . . . . . . . . . . . . 108 7.10.3 SCOTCH stratSave . . . . . . . . . . . . . . . . . . . . . . . . 109 7.10.4 SCOTCH stratGraphBipart . . . . . . . . . . . . . . . . . . . 109 7.10.5 SCOTCH stratGraphMap . . . . . . . . . . . . . . . . . . . . . 110 7.10.6 SCOTCH stratGraphMapBuild . . . . . . . . . . . . . . . . . . 110 7.10.7 SCOTCH stratGraphOrder . . . . . . . . . . . . . . . . . . . . 111 7.10.8 SCOTCH stratGraphOrderBuild . . . . . . . . . .
ページ6に含まれる内容の要旨
1 Introduction 1.1 Static mapping The efficient execution of a parallel program on a parallel machine requires that the communicating processes of the program be assigned to the processors of the machine so as to minimize its overall running time. When processes have a lim- ited duration and their logical dependencies are accounted for, this optimization problem is referred to as scheduling. When processes are assumed to coexist simul- taneously for the entire duration of the program, it is referr
ページ7に含まれる内容の要旨
1.3 Contents of this document This document describes the capabilities and operations of Scotch, a software packagedevotedtostatic mapping, graphandmeshpartitioning, andsparsematrix block ordering. Scotch allows the user to map efficiently any kind of weighted process graph onto any kind of weighted architecture graph, and provides high- quality block orderings of sparse matrices. The rest of this manual is organized as follows. Section 2 presents the goals of the Scotch project, and section 3 out
ページ8に含まれる内容の要旨
The free/libre software license under which Scotch 5.1 is distributed is the CeCILL-C license [6], which has basically the same features as the GNU LGPL (“Lesser General Public License”): ability to link the code as a library to any free/libre or even proprietary software, ability to modify the code and to redistribute these modifications. Version 4.0 of Scotch was distributed under the LGPL itself. Please refer to section 8 to see how to obtain the free/libre distribution of Scotch. 3 Algorithms
ページ9に含まれる内容の要旨
This function, which has already been considered by several authors for hyper- cube target topologies [11, 21, 25], has several interesting properties: it is easy to compute, allows incremental updates performed by iterative algorithms, and its minimization favors the mapping of intensively intercommunicating processes onto nearby processors; regardless of the type of routage implemented on the target machine (store-and-forward or cut-through), it models the traffic on the intercon- nection networ
ページ10に含まれる内容の要旨
callsagraph bipartitioning algorithm tosplitthesubsetofprocessesassociatedwith the domain across the two subdomains, as sketched in the following. mapping (D, P) Set_Of_Processors D; Set_Of_Processes P; { Set_Of_Processors D0, D1; Set_Of_Processes P0, P1; if (|P| == 0) return; /* If nothing to do. */ if (|D| == 1) { /* If one processor in D */ result (D, P); /* P is mapped onto it. */ return; } (D0, D1) = processor_bipartition (D); (P0, P1) = process_bipartition (P, D0, D1); mapping (D0, P0); /*
ページ11に含まれる内容の要旨
3.1.4 Partial cost function The production of efficient complete mappings requires that all graph bipartition- ings favor the criteria that we have chosen. Therefore, the bipartitioning of a ′ subgraph S of S should maintain load balance within the user-specified tolerance, ′ and minimize the partial communication cost function f , defined as C X def ′ ′ ′ f (τ ,ρ ) = w ({v,v})|ρ ({v,v})| , S,T S,T S S,T C ′ v ∈ V(S ) ′ {v,v } ∈E(S) ′ which accounts for the dilation of edges internal to subgraph S a
ページ12に含まれる内容の要旨
neighbor processes, whether they are handled by the job itself or not, since it can ′ estimateinf thedilationofthecorrespondingedges. Thisresultsinaninteresting C feedback effect: once an edge has been kept in a cut between two subdomains, the distancebetweenits endverticeswill beaccountedforinthe partialcommunication cost function to be minimized, and following jobs will be more likely to keep these vertices close to each other, as illustrated in Figure 2. The relative efficiency of CL0 CL0 CL2 CL
ページ13に含まれる内容の要旨
space that derives from the one which was globally defined at the coarsest level, thus preventing local optimization refinement algorithms to be trapped in local optima of the finer graphs [8]. Diffusion This global optimization method, presented in [42], flows two kinds of antag- onistic liquids, scotch and anti-scotch, from two source vertices, and sets the new frontier as the limit between vertices which contain scotch and the ones which contain anti-scotch. In order to add load-balancing constrai
ページ14に含まれる内容の要旨
tree rooted at a randomly chosen vertex, and this process is iterated by se- lecting a new root vertex within the last layer as long as the number of layers increases. Then, starting from the current root vertex, vertices are assigned layerafter layerto the first subdomain, until half of the total weight has been processed. Remaining vertices are then allocated to the second subdomain. AsfortheoriginalGibbs, Poole,andStockmeyeralgorithm,itisassumedthat the maximization of the number of layers res
ページ15に含まれる内容の要旨
for it to account for topological structures of the graph that would else be of a too high level for it to encompass in its local optimization process. 3.1.7 Mapping onto variable-sized architectures Several constrained graph partitioning problems can be modeled as mapping the problem graph onto a target architecture, the number of vertices and topology of which depend dynamically on the structure of the subgraphs to bipartition at each step. Variable-sized architectures are supported by the DRB
ページ16に含まれる内容の要旨
vertex separators are computed by using direct heuristics [28, 37], or from edge separators [48, and included references] by minimum cover techniques [9, 30], but other techniques such as spectral vertex partitioning have also been used [49]. Providedthat good vertex separatorsare found, the nested dissectionalgorithm produces orderings which, both in terms of fill-in and operation count, compare favorably[19, 31, 46] to the ones obtained with the minimum degreealgorithm[39]. Moreover, the elimin
ページ17に含まれる内容の要旨
measure its quality, severalparameters canbe defined: h , h , and h denote min max avg 1 the minimum, maximum, and average heights of the tree , respectively, and h dlt is the variance, expressed as a percentage of h . Since small separators result in avg small chains in the elimination tree, h should also indirectly reflect the quality avg of separators. 3.2.5 Ordering methods The core of our ordering algorithm uses graph ordering methods as black boxes, whichallowstheorderertorunanytypeoforderin
ページ18に含まれる内容の要旨
3.2.6 Graph separation methods The core of our incomplete nested dissection algorithm uses graph separation methods as black boxes. It allows the orderer to run any type of graph separation method compatible with our criteria for quality, that is, reducing the size of the vertex separator while maintaining the loads of the separated parts within some user-specified tolerance. Separation jobs maintain an internal image of the current vertex separator, indicating for every vertex of the job whether
ページ19に含まれる内容の要旨
Scotch can now handle compressed streams on the fly, in several widely used formats such as gzip, bzip2 or lzma. Please refer to Section 6.2 for more informa- tion. 4.2 Changes from version 5.0 Anew integerindex type hasbeencreatedinthe Fortraninterface, toaddressarray indices larger than the maximum value which can be stored in a regular integer. Please refer to Section 8.3 for more information. 5 Files and data structures For the sake of portability, readability, and reduction of storage space,
ページ20に含まれる内容の要旨
The numeric flag, similar to the one used by the Chaco graph format [24], is made of three decimal digits. A non-zero value in the units indicates that vertex weights are provided. A non-zero value in the tenths indicates that edge weights are provided. A non-zero value in the hundredths indicates that vertex labels are provided; if itis the case, verticescanbe storedinanyorderinthe file; else, natural order is assumed, starting from the graph base index. This header data is then followed by as ma