MCS572 UIC User's Local Guide to
NCSA Platinum (pt) IA32 Linux Cluster

version 0.60
06 April 2003


F. B. Hanson

Mail address:

Office address:

Hanson World Wide WEB Home Page:

UIC Fall 2003 Course Web Page:

Acknowledgement:


 

Table of Contents


Preface

This User's Local Guide is intended to be a sufficient, hands-on introduction to the National Center for Supercomputing Applications Platinum IA32 Linux Cluster for our MCS 572 Introduction to Supercomputing class. The Platinum cluster has a Linux variation of the UNIX operating system.

The NCSA Class Account for MCS572 Spring 2003 is `nfa' for the NCSA Grant ASC030009N .


Platinum Overview.

The NCSA Platinum is a large scale parallel cluster with 512 IBM eServer thin server compute nodes, each with two (2) 1-GHz Intel Pentium III processors, making a total of 1024 processors, four (4) user access nodes (8 processors), four (4) storage nodes (8 processors), running Red Hat Linux and Myricom's Myrinet cluster interconnect network. The NCSA Platinum's user interactive access nodes, under a round robin protocol, use the internet address

or using the full address platinum.ncsa.uiuc.edu, with the prompt of `[node]:~[line-number]%'. For Platinum information from NCSA, see

The Platinum IA32 (32 bit) Linux system is paired with a 64 bit system called the Titian IA64 Linux Cluster, with 160 IBM IntelliStation Z Pro server compute nodes with two 800MHz Intel Itanium processors per node, running Red Hat Linux and Myricom's Myrinet cluster interconnect network. Titian's web page should be consulted for updated system information:

What does the NCSA Platinum look like? NCSA Platinum Picture


Platinum Compute Nodes and Processors.

Each compute node is an Intel Netinfinity Pentium III 1GHz with a 256KB full-speed Level 2 cache and 1GFlop peak performance. The compute node network interconnect (I/C) is a Myrinet 100 Mbit Ethernet using crossbar switches with 16 ports and its "network in a box can interconnect 128 hosts. For more information on the compute nodes, see


Platinum Benchmark Performance.

The NCSA Platinum, installed at NCSA in 2001, ranks as the 91st top computer in the world (Top 500 Computer Reports, November 2002, Source: http://www.top500.org) and has a maximum speed Rmax = 594 GigaFlops (GF) on LINPACK linear algebra benchmarks, with Hockney Linear Model (see MCS572 class notes) parameters of theoretical asymptotic peak speed Rpeak = 1024 GF (also called Rinfinity), given at the web link above, or see the class summary

On this list Platinum is classified as an Intel NOW (Network of Workstations) Cluster. Interestingly, the companion Linux Cluster Titian is rated as number 88 in the world with 678 GFlops maximum speed on LINPACK benchmarks with peak speed of 1228 GFlops, even though it has a smaller number and less powerful processors showing you can not go my the gigaherz chip ratings.


Platinum Memory Units.

The random access memory (RAM) is globally shared 1.5 GB memory on the 2 processor nodes, but distributed memory or 768 GB with respect as a cluster of 512 nodes, so is has a hybrid memory system globally as a 1024 processor system. The processors or CPUs each have a 256 KB L2 cache memory (level 2 local memory).


Platinum Operating System.

The operating system is the Red Hat 7.2 - Linux 2.4.9, a public domain version of the Unix Operating System. However, since compilation and execution Platinum is by remote batch scheduling, the user uses a combination of the Maui Scheduler, Portable Batch System (PBS) and the UNIX Network Queueing System (NQS), the user should refer to subsections on those topics.


Platinum Login Shells.

The operating system environment is set by a UNIX shell and the default shell on the NCSA Platinum is the C-Shell. The shell can be changed, but that is not recommended, by the Change Shell command "chsh" and has the format:

where the Shell Path "[shellpath]" can be found with system "which" in the format:

where "[shell]" is the standard system shell "sh", Bourne again shell "bash", the Korn shell "korn" and others. However, all of the NQS QSUB job scripts given here assume the C-shell which uses the resource configuration file ".cshrc" which resides in the user's home directory and can be used to define commands and make aliases (format: "alias [alias-name] [alias-definition]", in cases of special command characters quotation marks are needed.). A sample of a ".cshrc" file for use on the Platinum is


Platinum-UIC Login Access.

Users MUST access the NCSA Platinum directly using the Secure Shell (ssh), such as from UIC `icarus' or from department systems,

or

If your computer system does not have this secure form, you will have to find one that does, like the UIC student computer server icarus.uic.edu since every student should have a UIC netid. If ssh has difficulty with the Unix ".ssh/known_hosts" (will differ on other platforms) then edit the file by deleting the entry for the node that is giving the problem since the ssh key may be expired and try ssh command again.

SSH works like the Unix remote login command `rlogin', but encrypts your password so that it is nearly impossible to steal. See "man ssh" for help from the UNIX manual pages.

SSH is a UNIX command found on may UNIX systems, but you can get a free MS Windows version that comes in two main flavors:


Platinum-UIC File Transfer.

Users MUST do their file transfer between the NCSA Platinum and UIC using the Secure Shell (ssh) commands such as secure copy scp or secure FTP sftp. Secure copy scp is more robust, since secure FTP sftp can be more difficult to connect with. For example, from UIC

SCP Secure Copy:

or from NCSA Platinum

This form of the command works well for a single file, which can also have a directory path, but the user password has to be given each time. For multiple files a wild card version can be use, e.g., for all C files omitting the target file name from NCSA:

SFTP Secure File Transfer Protocol: See "man scp" for help from the UNIX manual pages.

Also, you can use the secure File Transfer Protocol (FTP) called sftp that works like the usual FTP, except that you can not use any abbreviations of the FTP subcommands (e.g., use "put" and not "put"), but SFTP secures your session better. For example, from UIC,

or from NCSA Platinum

Remark: If your username is the same at both UIC node and NCSA node, then the "[username]@" is optional. See "man sftp" for help from the UNIX manual pages.


Platinum File Systems.

HOME Directory:

Each NCSA User has a home directory on the interactive access nodes to keep files and subdirectories with the full path specified by "/u/ac/[username]". The home directory can be more simply referenced by the UNIX symbol ~ or the UNIX meta or environmental variable representation ${HOME} as in cd $HOME to change directory back to home or ls ${HOME}/mcs572 to list contents of a home subdirectory "mcs572" (note that the curly brackets are optional in the first example but required in the second example where "HOME" is followed by nonblank characters. Home directory quotas are 500MB.

SCRATCH Directory:

Each user has a scratch or work directory "/nfs/storage[nx]/[username]" where [nx] = 1:2:7 and these directories are linked to the disks /storage[nx]/. The user's scratch directory can simply be referenced by the meta representation ${SCRATCH}, where the curly brackets are optional if ${SCRATCH} is used as a sole argument. It is recommended that the scratch directory directory be used for scheduling very large batch jobs on the Platinum cluster with the qsub queueing submit command including all necessary input files.

LOCAL Directory:

Each Platinum cluster computing node has global node memory accessible to all two of its processors and that memory is accessible to the user only when the user's code is executing, technically beginning with the qsub script required shell identification, e.g., "#!/bin/csh" escape to the C-shell. However, the parallel Virtual Machine Interface run command vmirun needs seemly redundant "./[executable]" file.

Remark: The commands "qsub" and "vmirun" are described more below. The "qsub" command also has an interactive mode "qsub -I -[options]" that that can be used on the home directory access nodes to move to the compute nodes where the user's job is executing, but more about this later in the QSUB section.

UniTree Archival Storage System:

UniTree is the NCSA mass storage system (mss) and runs on mss.ncsa.uiuc.edu and is easily accessible from Platinum and Titian (IA64 Linux Cluster) for large file storage for long periods of time. On Platinum, file transfer between user directories and the user's UniTree storage with need to login or give a password is by an ftp-like command:

which otherwise works like FTP or the command line version which also uses FTP subcommands, for example,

to change directory, get a file, put a file or delete a file, respectively. Use the Unix manual command "man mssftp" or "man mssftp" to get more information. The most beneficial part of the no login property of "man mssftp" and "man mssftp" is that they can be used in PBS QSUB job scripts. If the user has access to the kerberos version 5 of FTP then

can also be used remotely. For more information on NCSA Unitree see

However, for the class, Unitree is optional, except for very large storage. The web page for general Platinum file systems is


Platinum Programming Languages.

The Platinum programs are compiled directly on the Platinum, given here with some typical options when interfaced with MPI, using the

C Compiler:

or the

C++ Compiler is called g++. See "man gcc" for help from the UNIX manual pages for gcc and g++.

Warning: NCSA claims that "gcc" is fussy about the order of the options.

Remark: NCSA Platinum also has support for the Intel C compiler "icc":

NCSA supports the Intel F90/77 compiler "ifc", the Intel C++ compiler "icpc", the Portland Group C compiler "pgcc", the Portland Group C++ "pgCC" and the Portland Group Fortran 90 compiler "pgf90". For more information, see the NCSA Compiler page,

Also, for command line help, for example, try "icc -help" for Intel C and "man pgc" for the Portland Group (PGI) C.

In the above compilation commands, the options are

  • "-lmpich": references the Argonne National Lab portable Message Passing Interface (MPI) version MPICH Library that is called by Fortran or C compilers so that the code can execute in parallel, permitting the use of MPI parallel programming in the code. In addition to the MPI Library option, the code itself must include the MPI header directives in the code preface (code beginning), "include 'mpif.h'" for Fortran 90 and "#include <mpi.h>" for the C family of programming languages. Both the MPI Library and MPI Include statements are needed;

  • "-lvmi": allows parallel communication between the compute nodes by linking to the Virtual Machine Interface VMI Library for the best use of the interconnect;

  • "-ldl -lpthread": allows use of the DL and parallel threads PTHREADS Libraries.

  • "-o [executable]": names the output executable object file "[executable]", unless this option is missing and the executable is given the generic default name "a.out"; execution of the executable is by the massively parallel envelope VMI run command vmirun;

  • Other items are for the "include" directories and the "gcc" Library.

     

    VMIRUN Virtual Machine Interface Parallel Run Command:

    where "./[executable]" is the copy of the executable on the compute node, "< [data]" means the data file is directed into the standard UNIX input and ">& [output]" means the standard UNIX output is directed to the file "[output]". Both "[data]" and "[output]" should be specified with a full directory path, like the home directory. An executable can not run in parallel without vmirun". Usually, the number of nodes and the number of processors are specified by PBS inputs in the qsub job file, since both must be initially set by a PBS statement in the QSUB script or in the options of the "qsub" command.


    Platinum Batch Queueing Systems: PBS and NQS with MPI.

    NQS Job Scripts:
    Remote job scheduling on the Platinum is accomplished by using the UNIX Network Queueing System (NQS) job scripts, but the script directives use the so-called Portable Batch System (PBS) Directives used on Platinum, in place of the usual NQS Directives.

    The new user should study these sample job scripts and others listed on the class homepage:

     

    Sample Job Scripts: Platinum 4 Processor C Code Job Script cpgm4.job

    Source:

     

    Executable Job Scripts:
    Before any job script can be used as an argument of the qsub the job script must be made executable for all, e.g., using the UNIX change mode command:

    where in the second from, the files should already be readable (r).

     

    NQS qsub Submit Command:

    These job scripts are run with the NQS QSUB submit command from the user's "${HOME}" home directory or "${SCRATCH}" scratch directory, for example,

    where "${HOME}" and "${SCRATCH}" denote the meta-names of the user's home and scratch directories, respectively, on the Platinum cluster. See "man qsub" for help from the UNIX manual pages or the NCSA webpage

     

    NQS qstat Status Command:

    The job status can be checked by the NQS QSTAT status command:

    and when done, the user can view the output if any. Under the table heading called "S" ,e.g., "Q" means that the job is queued waiting to run, "R" means running, and "E" means exiting. See "man qstat" for help from the UNIX manual pages.

     

    NQS qdel Delete Command:

    If for any reason you need to kill the job before the end, first note the job id number `[job_id]' at the beginning of your job line in the "qstat -u [Pt-username]" output, then enter the command:

    which should stop a running job, unless the system is busy. See "man qdel" for help from the UNIX manual pages.

     

    Job Script Examples:

    A user can try out the class sample NQS QSUB job scripts by down loading and copying one of the following sample codes

    to your home directory and then recopying it, say "[Example-Code].c" or "[Example-Code].f" to the recyclable source file of the form `*pgm.*' as follows:

    for C or F90, respectively.

    The user will also have to create a simple input data file called "cdata" or use the Pi Code example data file for the qsub scripts since the script are written to take a data file as standard input, (e.g., using the editor "vi" to revise the set of integration points in cdata, terminated by zero) into the input data file; then in the home directory entering the queue submit command for 4 processors on a single node:

    then check for a finished job with "qstat -u [NCSA-username]" until the your queue record no longer is displayed, finally looking for the standard output and standard error files, for example "ls -l *pgm4.output *pgm4.error". You can always modify the sample job scripts to suit your particular job requirements, your own file naming preferences or if you prefer to open and close files in the code by hand.

     

    Summary of Running Job Scripts with Sample Source in Home Directory:

    1. Copy MPI code source ( pi_mpi.c here, assumed downloaded) to generic file used by class 4 processor C code QSUB script:

        cp pi_mpi.c cpgm4.c
    2. Copy data source ( pidata here, assumed downloaded) to generic file used by class 4 processor C code QSUB script:

        cp pidata cdata
    3. Compile C for MPI execution on Platinum compute nodes:

        gcc -I/usr/local/vmi/mpich/include cpgm4.c -o cpgm4 -L/usr/local/vmi/mpich/lib/gcc -lmpich -lvmi -ldl -lpthread -O
    4. Run class generic QSUB job script ( cpgm4.job, assumed downloaded):

        qsub cpgm4.job
    5. Check job status using QSTAT (for UIC mail forward notification uncomment "#PBS -m be"):

        qstat -u [Pt-User]
    6. When job is finished (no qstat listing for job or mail notification) then list (cat) output or view with an editor or "scp" copy back to UIC:

        cat cpgm4.output

     

    Summary of Running Job Scripts with Sample Source in UniTree Mass Storage System:

    1. Copy MPI code source ( pi_mpi.c here, assumed downloaded) to generic file used by class 2 processor C code QSUB script:

        cp pi_mpi.c cpgm2.c
    2. Copy data source ( pidata here, assumed downloaded) to generic file used by class 2 processor C code QSUB script:

        cp pidata cdata
    3. Compile C for MPI execution on Platinum compute nodes:

        gcc -I/usr/local/vmi/mpich/include trap_mpi.c -o trap_mpi -L/usr/local/vmi/mpich/lib/gcc -lmpich -lvmi -ldl -lpthread -O
    4. Put Data and Executable in UniTree Mass Storage System:

        msscmd put cdata, put cpgm2mss
    5. Run class generic QSUB job script (cpgm2mss.job, assumed downloaded):

        qsub cpgm2mss.job
    6. Check job status using QSTAT (for UIC mail forward notification uncomment "#PBS -m be"):

        qstat -u [Pt-User]
    7. When job is finished (no qstat listing for job or mail notification) Get output from UniTree Mass Storage System:

        msscmd get cpgm2mss.output
    8. List (cat) output or view with an editor or "scp" copy back to UIC:

        cat cpgm2mss.output

     

    Summary of Interactively Running Jobs without Scripts: (Caution: This can lock up your current session while the job is waiting for processor resources to become available, which can be long when Platinum is loaded with user jobs, but if you are using Unix/Linux you can easily start another simultaneous session.)

    1. Compile C for MPI execution ( trap_mpi.c here, assumed downloaded) on Platinum compute nodes:

        gcc -I/usr/local/vmi/mpich/include trap_mpi.c -o trap_mpi -L/usr/local/vmi/mpich/lib/gcc -lmpich -lvmi -ldl -lpthread -O
    2. Run QSUB interactively using the "-I" option, specifying a 30 minute production job using 3 nodes with 2 processor per node (note: NCSA charges by nodes, so you should avoid odd numbered processor job if possible) or 8 processors total:

        qsub -I -V -l walltime=00:30:00,nodes=3:ppn=2:prod
    3. While waiting for compute node resources from your access node start, you will see system messages like

        This job will be charged to project: nfa
        qsub: waiting for job 57206.mgmt2.ncsa.uiuc.edu to start
        qsub: job 57206.mgmt2.ncsa.uiuc.edu ready
        
        ----------------------------------------
        !Begin PBS Prologue Thu Apr  3 00:49:13 CST 2003
        Job ID:         57206.mgmt2.ncsa.uiuc.edu
        Username:       [Pt-user]
        Group:          nfa
        Nodes:          cn249 
        End PBS Prologue Thu Apr  3 00:49:14 CST 2003
        ----------------------------------------
        
    4. When you get the ready part of the messages then you will get a compute node system prompt and you can start entering your VMI RUN command line, for example:

        [cn295:~64%] vmirun ./trap_mpi >& ./trap_mpi.output
    5. More system output should follow and when complete you can exit the compute node:

        [cn295:~65%] exit
    6. When job is finished (no qstat listing for job or mail notification) then list (cat) output or view with an editor or "scp" copy back to UIC:

        cat trap_mpi.output
    From more information see


    Platinum Message Passing Interface (MPI) Sources.

    MCS 572 Class MPI web-pages:

    NCSA MPI Basics:

    Cray native SHMEM communication library also available, but is optimized between nodes like ELAN only and not within a node:

    OpenMP is supported in Tru64 UNIX for C and Fortran, but not C++:


    Platinum Timers, Profiling and Debugging.

    For MPI programs, timing programs is usually accomplished with the MPI wall timer function "MPI_Wtime()", which can be used in an unsynchronized way by itself or synchronized with the MPI barrier function "MPI_Barrier([Communications_group])". As an example, in C, consider the fragment:

    For information on more Platinum Unix, Linux and other timers, as well as performance profilers and debuggers, see


    Platinum Editors.


    More Platinum Information.

    For Platinum information from NCSA, see


    Guide Notation.

    This local-guide is meant to indicate ``what works'' primary for access from UNIX systems to NCSA Platinum. The use of the Unix C-Shell on the Platinum is assumed throughout most of this local guide.

    UNIX is a trademark of AT&T.

    Computer prompts or broadcasts will be enclosed in double quotes (``_''), background comments will be enclosed in curly braces ({_}), commands cited in the comments are highlighted by single quotes or double quotes depending on emphasis (`_') or ("_") {do not type the quotes when typing the commands}, and optional or user specified arguments are enclosed in square brackets ([_]) {However, do not enter the square brackets.}. The symbol (CR) will denote an immediate carriage return or enter. {Ignore the blanks that precede it as in `[command] (CR)', making it easier to read.} The symbol (Esc) will denote an immediate pressing of the Escape-key {Use no brackets please.} The symbol (SPACE) will denote an immediate pressing of the Space-bar {Warning: Do not type any of these notational symbols in an actual computer session.}


    Return to TABLE OF CONTENTS?

    REST OF GUIDE UNDER CONSTRUCTION!

    See PSC TCS Local User's Guide in the interim.


    The best way to learn these commands is to use and test them in an actual computer session on the Platinum IA32 Linux Cluster.

    Good luck.

    Return to TABLE OF CONTENTS?

    Please report to Professor Hanson any problems or inaccuracies:


    Web Source: http://www.math.uic.edu/~hanson/pt03guide.html