Summary of the content on the page No. 1
Administrator's Guide
Release 5.0.5
Published April 2010
Summary of the content on the page No. 2
ParaStation5 Administrator's Guide ParaStation5 Administrator's Guide Release 5.0.5 Copyright © 2002-2010 ParTec Cluster Competence Center GmbH April 2010 Printed 7 April 2010, 14:11 Reproduction in any manner whatsoever without the written permission of ParTec Cluster Competence Center GmbH is strictly forbidden. All rights reserved. ParTec and ParaStation are registered trademarks of ParTec Cluster Competence Center GmbH. The ParTec logo and the ParaStation logo are trademarks of ParTec Cluste
Summary of the content on the page No. 3
Table of Contents 1. Introduction ................................................................................................................................. 1 1.1. What is ParaStation ......................................................................................................... 1 1.2. The history of ParaStation ................................................................................................ 1 1.3. About this document .............................................
Summary of the content on the page No. 4
ParaStation5 Administrator's Guide 6.2. Problem: node shown as "down" .................................................................................... 29 6.3. Problem: cannot start parallel task ................................................................................. 30 6.4. Problem: bad performance ............................................................................................ 30 6.5. Problem: different groups of nodes are seen as up or down .......................
Summary of the content on the page No. 5
Chapter 1. Introduction 1.1. What is ParaStation ParaStation is an integrated cluster management and communication solution. It combines unique features only found in ParaStation with common techniques, widely used in high performance computing, to deliver an integrated, easy to use and reliable compute cluster environment. The version 5 of ParaStation supports various communication technologies as interconnect network. It comes with an optimized communication protocol for Ethernet that enables
Summary of the content on the page No. 6
About this document In the middle of 2004, all rights on ParaStation where transferred from ParTec AG to the ParTec Cluster Competence Center GmbH. This new company takes a much more service-oriented approach to the customer. The main goal is to deliver integrated and complete software stacks for LINUX-based compute clusters by selecting state-of-the-art software components and driving software development efforts in areas where real added value can be provided. The ParTec Cluster Competence Cen
Summary of the content on the page No. 7
Chapter 2. Technical overview Within this section, a brief technical overview of ParaStation5 will be given. The various software modules constituting ParaStation5 are explained. 2.1. Runtime daemon In order to enable ParaStation5 on a cluster, the ParaStation daemon psid(8) has to be installed on each cluster node. This daemon process implements various functions: • Install and configure local communication devices and protocols, e.g. load the p4sock kernel module and set up proper routing info
Summary of the content on the page No. 8
License • p4sock.o: this module implements the kernel based ParaStation5 communication protocol. • e1000_glue.o, bcm5700_glue.o: these modules enable even more efficient communication to the network drivers coming with ParaStation5 (see below). • p4tcp.o: this module provides a feature called "TCP bypass". Thus, applications using standard TCP communication channels on top of Ethernet are able to use the optimized ParaStation5 protocol and therefore achieve improved performance. No modifications
Summary of the content on the page No. 9
Chapter 3. Installation This chapter describes the installation of ParaStation5. At first, the prerequisites to use ParaStation5 are discussed. Next, the directory structure of all installed components is explained. Finally, the installation using RPM packages is described in detail. Of course, the less automated the chosen way of installation is, the more possibilities of customization within the installation process occur. On the other hand even the most automated way of installation, the inst
Summary of the content on the page No. 10
Software Software ParaStation requires a RPM-based Linux installation, as the ParaStation software is based on installable RPM packages. All current distributions from Novell and Red Hat are supported, like • SuSE Linux Enterprise Server (SLES) 9 and 10 • SuSE Professional 9.1, 9.2, 9.3 and 10.0, OpenSuSE 10.1, 10.2, 10.3 • Red Hat Enterprise Linux (RHEL) 3, 4 and 5 • Fedora Core, up to version 7 For other distributions and non-RPM based installations, please contact . In or
Summary of the content on the page No. 11
Installation via RPM packages man contains the manual pages describing the ParaStation daemons, utilities and configuration files after installing the documentation package. The necessary steps are described in Section 3.4, “Installing the documentation”. In order to enable the users to access these pages using the man(1) command, please consult the 2 corresponding documentation . mpi2, mpi2-intel, mpi2-pgi, mpi2-psc contains an adapted version of MPIch2 after installing one of the various psmpi
Summary of the content on the page No. 12
Compiling the ParaStation5 packages from source Please note that the individual version numbers of the distinct packages building the ParaStation5 system do not necessarily have to match. Compiling the ParaStation5 packages from source To build proper RPM packages suitable for a particular setup, the source code for the ParaStation packages 3 can be downloaded from www.parastation.com/download . Typically, it is not necessary to recompile the ParaStation packages, as the provided precompiled pa
Summary of the content on the page No. 13
Installing the documentation # rpm -Uv psmgmt.5.0.0-0.i586.rpm pscom.5.0.0-0.i586.rpm \ pscom-modules.5.0.0-0.i586.rpm This will copy all the necessary files to /opt/parastation and the kernel modules to /lib/modules/ kernelversion/kernel/drivers/net/ps4. On a frontend node or file server, the pscom-modules package is only required, if this node should run processes of a parallel task. If the frontend or fileserver node is not configured to run compute processes of parallel tasks, the instal
Summary of the content on the page No. 14
Installing MPI # rpm -Uv psdoc-5.0.0-1.noarch.rpm All the PDF and HTML files will be installed within the directory /opt/parastation/doc, the manual pages will reside in /opt/parastation/man. The intended starting point to browse the HTML version of the documentation is file:///opt/ parastation/doc/html/index.html. The documentation is available in two PDF files called adminguide.pdf for the ParaStation5 Administrator's Guide and userguide.pdf for the ParaStation5 User's Guide. Both can be fou
Summary of the content on the page No. 15
Uninstalling ParaStation5 • testing These steps will be discussed in Chapter 4, Configuration. 3.7. Uninstalling ParaStation5 After stoping the ParaStation daemons, the corresponding packets can be removed using # /etc/init.d/parastation stop # rpm -e psmgmt pscom psdoc psmpi2 on all nodes of the cluster. ParaStation5 Administrator's Guide 11
Summary of the content on the page No. 16
12 ParaStation5 Administrator's Guide
Summary of the content on the page No. 17
Chapter 4. Configuration After installing the ParaStation software successfully, only few modifications to the configuration file parastation.conf(5) have to be made in order to enable ParaStation on the local cluster. 4.1. Configuration of the ParaStation system Within this section the basic configuration procedure to enable ParaStation will be described. It covers the configuration of ParaStation5 using TCP/IP (Ethernet) and the optimized ParaStation5 protocol p4sock. The primarily configurati
Summary of the content on the page No. 18
Enable optimized network drivers The values that might be assigned to the HWType parameter have to be defined within the parastation.conf configuration file. Have a brief look at the various Hardware sections of this file in order to find out which hardware types are actually defined. Other possible types are: mvapi, openib, gm, ipath, elan, dapl. To enable shared memory communication used within SMP nodes, no dedicated hardware entry is required. Shared memory support is always enabled by defau
Summary of the content on the page No. 19
Testing the installation transfer application data across Ethernet, this adapted drivers should be used, too. To enable these drivers, the simplest way is to rename the original modules and recreate the module dependencies: # cd /lib/modules/$(uname -r)/kernel/drivers/net # mv e1000/e1000.o e1000/e1000-orig.o # mv bcm/bcm5700.o bcm/bcm5700-orig.o # depmod -a If your system uses the e1000 driver, a subsequent modinfo command for kernel version 2.4 should report that the new ParaStation ve
Summary of the content on the page No. 20
Testing the installation Alternatively, it is possible to use the single command form of the psiadmin command: # /opt/parastation/bin/psiadmin -s -c "list" The command should be repeated until all nodes are up. The ParaStation administration tool is described in detail in the corresponding manual page psiadmin(1). If some nodes are still marked as "down", the logfile /var/log/messages for this node should be inspected. Entries like “psid: ....” at the end of the file may report problems or err