VisIt Launcher

VisIt is comprised of different cooperating components. To unify them under one "visit" command, a launch script is used. The launch script encapsulates version-specific coding and takes care of setting up environment variables and all of the things that must be in place for VisIt to run properly. The launcher also translates various command line arguments into MPI job submission commands, enabling VisIt to run in parallel under many job-control systems. The main reason that the launch is handled by a script to allow for customizations.

Organization

VisIt's launch scripts are comprised of 2-3 python scripts: a rarely changed frontendlauncher or visit script and a version-specific internallauncher script. VisIt 2.6 and later also permit the use of a customlauncher script to allow for site-specific customizations.

frontendlauncher

The frontendlauncher is aliased to visit and is the command that users run. The frontendlauncher script takes care of the following:

  • version selection
  • architecture selection

In a typical VisIt installation, the binaries for many versions and platforms can coexist. The top level bin directory contains the frontendlauncher and visit command. The top level directory also contains several version subdirectories, each containing at least one architecture. The frontendlauncher selects the newest version for the appropriate architecture.

visit/bin
visit/2.4.0
visit/2.4.0/bin
visit/2.4.0/linux-intel
visit/2.4.0/darwin-x86_64
visit/2.5.0
visit/2.5.0/bin
visit/2.5.0/linux-intel
visit/2.5.0/darwin-x86_64

internallauncher

Within each version subdirectory, there is also a bin directory containing the internallauncher. The internallauncher is a script that actually takes care of setting up and executing the VisIt programs. The frontendlauncher runs the internallauncher. This division lets different VisIt versions have version-specific modifications in the launching code.

The internallauncher takes care of:

  • running VisIt component programs
  • setting up environment variables
  • submitting parallel jobs
  • running under a debugger

customlauncher

The customlauncher script is an optional script that allows maintainers to customize the VisIt launch procedure for their site. The customlauncher script will often be used to return custom subclasses of JobSubmitter that alter the way VisIt launches parallel jobs.

Individual customlauncher scripts get installed by visit-install, which copies customlauncher into the internal bin directory next to internallauncher. The customlauncher scripts should be placed in VisIt's source tree in src/resources/hosts/<site> where <site> is the name of a computing site. The site names typically have the same name as the options found in visit-install.

visit/bin
visit/bin/frontendlauncher
visit/2.6.0
visit/2.6.0/bin
visit/2.6.0/bin/internallauncher
visit/2.6.0/bin/customlauncher
visit/2.6.0/linux-intel
visit/2.6.0/darwin-x86_64

VisIt 2.6

VisIt 2.6 introduced a new version of the launch scripts. Whereas before the scripts were written in PERL, the new scripts are written in Python and are better structured for extensibility and customization. In the new scheme, the frontendlauncher runs the internallauncher in the same Python interpreter rather than spawning a new command.

The new internallauncher script contains various Python classes that help launch VisIt commands.

  • JobSubmitter classes let VisIt submit a parallel compute engine to a job control system.
  • Debugger classes help launch VisIt under a debugger.
  • MainLauncher class contains methods that are used to effect a launch

The main function for the internallauncher script is the internallauncher function. The internallauncher function uses a MainLauncher object '(or derived class)' to go through the various steps that are needed to run a VisIt program.

Customization

The previous versions of internallauncher had hacks for various HPC centers strewn throughout the script. The most common pattern was to have some top level initialization for a specific site and then various hacks to MPI job submission elsewhere in the script.

The new launch system allows for a customlauncher file that contains a derived class of MainLauncher. The derived class can perform its own top-level specific initialization without polluting the main internallauncher script. Furthermore, since MPI launching has handled by various JobSubmitter classes, the derived MainLauncher class can return its own JobSubmitter classes that contain site-specific tweaks to MPI launching.

Here is a simple example customlauncher script:

# Custom launcher
class SiteSpecificLauncher(MainLauncher):
    def __init__(self):
        super(SiteSpecificLauncher, self).__init__()

    def Customize(self):
        # ----
        # Global initialization
        # ----
        if self.sectorname() == "mycluster":
            paths = self.splitpaths(GETENV("LD_LIBRARY_PATH"))
            addedpaths = ["/usr/local/compilers/GNU/gcc-4.3.2/lib64"]
            SETENV("LD_LIBRARY_PATH", self.joinpaths(paths + addedpaths))

# Launcher creation function
def createlauncher():
    return SiteSpecificLauncher()

Here is a simple example that returns a custom JobSubmitter for mpirun:

# Custom mpirun job submitter
class JobSubmitter_mpirun_custom(JobSubmitter_mpirun):
    def __init__(self, launcher):
        super(JobSubmitter_mpirun_custom, self).__init__(launcher)

    #
    # Override the name of the mpirun executable, give it arguments
    #
    def Executable(self):
        return ["/my/special/bin/mpirun", "-arg1", "-arg2"]

# Custom launcher
class SiteSpecificLauncher(MainLauncher):
    def __init__(self):
        super(SiteSpecificLauncher, self).__init__()

    def Customize(self):
        # ----
        # Global initialization
        # ----
        if self.sectorname() == "mycluster":
            paths = self.splitpaths(GETENV("LD_LIBRARY_PATH"))
            addedpaths = ["/usr/local/compilers/GNU/gcc-4.3.2/lib64"]
            SETENV("LD_LIBRARY_PATH", self.joinpaths(paths + addedpaths))

    def JobSubmitterFactory(self, launch):
        # Create our own "mpirun" job submitter.
        if launch == "mpirun":
            return JobSubmitter_mpirun_custom(self)
        return super(SiteSpecificLauncher, self).JobSubmitterFactory(launch)

# Launcher creation function
def createlauncher():
    return SiteSpecificLauncher()

JobSubmitter

The JobSubmitter class is the base class for all job submitters. Each job submitter has 2 key methods:

Method Description
Executable() The Executable() method returns a list containing the command that is used to submit the MPI job and any default command line arguments you might want to provide.
CreateCommand() The CreateCommand() method takes in a tuple of VisIt command line arguments, typically preformatted and ready to run. The CreateCommand() method's job is to reformat the arguments to run them under the specific MPI submission command as well as do any other initialization that is needed. Some job submitters set up extra environment variables or create files with commands to execute. Ultimately, this is the method that produces the command line that is run for the launch of the parallel program.

Adding a new job submitter

Create a new class derived from JobSubmitter and implement its Executable() and CreateCommand() methods. For the job submitter to be available to the launcher, you must add it to the MainLauncher class' JobSubmitterFactory() method. When the input launcher name matches that for your launcher, return an instance of your new job submitter and the MainLauncher will handle the rest. To use your job submitter, pass -l name to the VisIt script where name is the name of your job submitter.

qsub job submitter

The JobSubmitter_qsub class is currently the most complex subclass of JobSubmitter. The complexity arises from the need to handle variation in qsub command line arguments among different computing sites. Furthermore, the variation among qsubs gives rise to many sites having custom qsub launchers. Another factor in the complexity is that qsub launching often uses a sublauncher to actually run the parallel command and this is typically done from within a shell script. So, the JobSubmitter_qsub class must build up a script that runs the VisIt programs and then submit that script using qsub with possibly different arguments, depending on the computing site.

Customizing sublauncher command

The parallel command needed to run MPI jobs varies according to which sublauncher using used. For example, using a qsub/mpirun launcher will use qsub to submit the job to the batch scheduler while using mpirun within the launch script to actually run the parallel program.

The following sublaunchers are supported with qsub:

  • mpiexec
  • mpirun
  • srun
  • ibrun
  • aprun
  • (no sublauncher)

Each sublauncher will have a corresponding method in the JobSubmitter_qsub class that returns a list containing the command and any command line arguments that should be used to run the launcher. In addition, there is another method with the name of the sublauncher, followed by _args that returns a list that builds up the command needed to run the parallel program.

Example:

def ibrun(self):
    return ["ibrun"]

def ibrun_args(self, args):
    mpicmd = self.ibrun()
    mpicmd = mpicmd + self.VisItExecutable() + args
    return mpicmd

If you were at a compute site where using ibrun needed to be customized, you could create a subclass of JobSubmitter_qsub and override the ibrun and ibrun_args methods to change how ibrun is used for your system.

Example:

class JobSubmitter_qsub_custom(JobSubmitter_qsub):
    def __init__(self, launcher):
        super(JobSubmitter_qsub_custom, self).__init__(launcher)

    def ibrun(self):
        return ["/my/special/ibrun"] # Use a special path to ibrun

    def ibrun_args(self, args):
        mpicmd = self.ibrun()
        if self.parallel.np != None: # This is what we added
            mpicmd = mpicmd + ["-np", str(self.parallel.np)]
        mpicmd = mpicmd + self.VisItExecutable() + args
        return mpicmd

Custom module loading

Sometimes an HPC environment will need specific modules loaded in order for software to run as expected. This is commonly the case with programs run under a qsub launcher. You can easily override the TFileLoadModules method in your qsub JobSubmitter subclass to make it load the modules that you want in the job script that is ultimately executed under qsub.

class JobSubmitter_qsub_custom(JobSubmitter_qsub):
    def __init__(self, launcher):
        super(JobSubmitter_qsub_custom, self).__init__(launcher)

    def TFileLoadModules(self, tfile):
        f.write("/etc/profile.d/modules.sh\n")
        f.write("module rm gcc\n")
        f.write("module load gcc/4.4.6\n")
        f.write("module rm openmpi\n")
        f.write("module load openmpi/1.6.0/gcc/4.4.6\n")

Customizing launch script

The command that gets executed to start parallel jobs is customized via the section above. However, there my be other initialization that you want to add to the script that qsub will execute. You can change how the script is constructed by overriding these methods:

Method Description
CreateFilename() Return the filename that will be used for the script.
TFileLoadModules() Add any module loading to this method. This method is passed an open handle to the script being created so you can file.write() new lines of text to the file. The default implementation just returns.
TFileSetup() Writes setup commands to the file. This method is passed an open handle to the script being created so you can file.write() new lines of text to the file. The default implementation changes the directory and disables core files.

Customizing qsub command

The qsub command can be highly variable among systems, at least for some command line flags. The qsub command that the JobSubmitter_qsub class creates depends on the following methods:

Method Description
SetupPPN() One of the most variable arguments is the "-l" argument when it is used to specify the number of nodes and processors, as in "-l nodes=2:ppn=2". Since the treatment of these arguments may change from system to system, there is a separate method called SetupPPN() that handles adding these arguments to the qsub command line. You can override the SetupPPN() method on your JobSubmitter_qsub subclass if you need special handling for these arguments.
SetupTime() Time is also handled by a special method called SetupTime() that adds "-l walltime=XX" arguments to the qsub command line. This method can also be overridden if you need to handle time differently.
AddEnvironment() The qsub launcher for VisIt adds certain command line arguments to the qsub command line via a -v argument to qsub. These environment variables are added in the AddEnvironment() method, which you can override.
AssembleCommandLine() If you need full control over how qsub command lines are assembled, you can override the AssembleCommandLine() method.

Debugger

The Debugger class is the base class for all debuggers. The internallauncher script currently supports the following debuggers:

  • gdb
  • totalview
  • strace
  • valgrind

The only method that gets called on a debugger object during VisIt's launch is the CreateCommand() method.

Method Description
CreateCommand() Override the CreateCommand() method to construct the arguments that you'll need to start VisIt under a debugger. This method is applied to the visit command line once it is ready to run. This gives your debugger class an opportunity to change the command line that will be executed for VisIt in order to instead start VisIt under a debugger.

Adding a new debugger

To add a new debugger, do the following:

  1. Create a new derived class of Debugger, overriding the CreateCommand() method.
  2. Add a new debugger name to MainLauncher.Debuggers() list of debugger names. This helps VisIt parse the debugger arguments.
  3. Make the MainLauncher.DebuggerFactory() method return your derived class when the appropriate debugger name is passed.

MainLauncher

The MainLauncher class contains all of the methods to launch the standard VisIt programs. It makes use of various JobSubmitter classes to launch MPI jobs and it uses various Debugger classes to launch VisIt in a debugger.

Helper classes

The MainLauncher class uses 3 helper classes to parse command line arguments. These classes take command line argument values and set class members. The MainLauncher class contains an instance of each class that is used to contain the state for the various types of arguments. This state can be passed around to job submitters and debugger classes so global variables are not used. In addition to gathering command line arguments, the helper classes also are responsible for adding their particular state to the command line that is eventually used to launch VisIt programs.

The classes are:

  • GeneralArguments - most command line arguments fit here
  • ParallelArguments - parallel-related command line arguments
  • DebugArguments - debugger-related command line arguments

The helper classes each provide 2 methods to the MainLauncher:

Method Description
ParseArguments() Set object state based on a command line argument
ProduceArguments() Produce a list of command line arguments based on object state

Helper functions

These are some helper functions that are borrowed from frontendlauncher. They are used but are not part of the MainLauncher class.

Function Description
exit(msg, code) Exit the internallauncher with a message and a return code
GETENV(var) Return a string containing the requested environment variable
SETENV(var, value) Set an environment variable to the specified value

Helper methods

Rather than call various forms of command line utilties back-ticked and passed through "tr" and so on, the new launch system provides helper methods for common tasks such as getting the hostname, splitting paths, etc.

Method Description
username() Return the user name.
hostname() Return the full host name: edge83.llnl.gov
nodename() Return the node name: edge83
sectorname() Return the sector name: edge
domainname() Return the domain name: llnl.gov
uname() Return the OS name
splitpaths() Split paths separated by ':' and return a unique list
joinpaths(paths) Join a list of paths into a string separated by ':'
quoted(s) Return a string surrounded by quotes if the string contains spaces
writepermission(path) Return True if the path has write permission; False otherwise
call(args, stdinpipe) Call a process given by args, which is a list of command line arguments. If stdinpipe is true then make a pipe for stdin.
message(msg, file) Call when the launcher should issue a message.
warning(msg) Call when the launcher should issue a warning message.
error(msg) Call when the launcher should issue an error message.
iscomponent(name) Return true if name is a VisIt component.

Methods that are part of a launch

In order to provide derived launcher classes to have more control over the launch process, the launch process is divided into different methods.

The main methods that most launcher subclasses will override are: Customize() and JobSubmitterFactory().

Method Description
Customize() Method to let derived classes perform top-level initialization
JobSubmitterFactory() Method that creates an instance of a JobSubmitter class to handle a given parallel launcher

Here are some other methods that can be overridden. The methods are given in the order that the internallauncher function calls them.

Method Description
Initialize() Initialize some important members of the object.
SetLogging() Set whether logging of executables is enabled.
ParseArguments() Read the command line arguments into different objects (general, parallel, debugger).
DetermineArchitecture() Determine a list of supported VisIt architecture strings.
ConsistencyCheck() Check the different command line arguments for consistency and alter their values if needed.
Customize() Method that lets derived classes do top-level custom initialization.
UpdateExecutableNames() Update the name for the executable that we'll run. This usually only affects the engine, transforming engine to engine_ser or engine_par.
SetupDirectoryNames() Set up various directory names that can be used to reference different parts of the VisIt installation.
SetupEnvironment() Set up the environment variables that we need to run VisIt.
MakeUserDirectories() Make the user's ~/.visit plugin subdirectories.
PrintUsage() Print the usage (must be called after SetupDirectoryNames)
PrintEnvironmentShell() Print shell commands to replicate the environment that VisIt has set up.
Launch() Launch the VisIt executable once all other preparations have been made. This method creates a command line to execute and then passes it to the call() method. The program's return value is returned from this method.

Differences from older versions

Since the entire system of launching scripts was rewritten in Python, there are some important differences above and beyond what changes were made in the implementation:

Where are my hacks?

The old internallauncher script contained a lot of machine-specific hacks. Those hacks have been moved into customlauncher files for various sites. These can be found in the src/resources/hosts/<site> directory where <site> is the name of a computing site.

If you are testing on a cluster and working within a source directory, you can enable your site customizations again by copying the customlauncher file into the src/bin directory.

Argument ordering

The order of command line arguments was somewhat preserved in the original launchers. The new launcher does not preserve command line argument ordering.

Printed command

The original launchers printed a command string to the console that did not necessarily match the command that was actually being executed. The launch command for running jobs under msub was a good example since it would print the cat'd contents of the constructed launch script as part of the printed command line rather than give the name of the script that contained the commands.

The new launcher prints what it executes (with the caveat that -key token arguments are filtered out as before).

Norun output

The output for -norun in the new launcher is formatted such that environment variables come first, followed by the VisIt command to run. This lets you paste all of the commands in order into a command line shell.

Use of loopback interface

The old launch system often intended to use the local machine's loopback interface 127.0.0.1 but it frequently did not do so. The new script is more aggressive about using 127.0.0.1 as the host for local process launches unless -noloopback is passed.

PYTHONHOME

  1. The shell portion of the new frontendlauncher unsets the PYTHONHOME environment variable.
  2. The new internallauncher then sets PYTHONHOME to point to VisIt's Python modules.

The reason for #2 is so we can transplant the VisIt CLI on to systems other than where it was built. Setting the PYTHONHOME variable is necessary to make the CLI find the Python modules.

The reason for #1 is that setting PYTHONHOME as in #2 causes the system Python to not be able to locate its own modules. The VisIt launchers are run using the system Python, which may be incompatible with VisIt's Python.

GDB arguments

The old launcher had specific arguments to launch components under gdb (e.g. -gdb-engine, -gdb-mdserver, -gdb-viewer, -gdb-gui). These arguments were removed in the new launcher to conform to the style used by the other debuggers such as totalview. The general pattern is: -debuggername [debugger args] component.

To run the engine under GDB in a new window:

visit -gdb engine_ser -xterm

Hardware pre/post arguments

Both the new and old launch scripts have support for -hw-pre and -hw-post arguments. These arguments are intended to be commands to start up X servers and tear them down, though the commands could really be anything. The old version of internallauncher constructed commands oddly and would often have a sublauncher run the hw-pre and hw-post commands as part of the compute engine command line being run. This lead to command scripts that seem like they could not possibly work as intended:

#!/bin/sh
cd somewhere
ulimit -c 0
srun -n 8 startx /path/to/bin/engine_par -host 127.0.0.1 -port 5600 -norun engine_par -noloopback stopx

Of course, srun could launch the startx command but subsequent command line arguments would be command line arguments to startx and it does not seem that the correct command sequence would be run.

The new internallauncher will create a command script that looks like this:

#!/bin/sh
cd somewhere
ulimit -c 0
startx
srun -n 8 /path/to/bin/engine_par -host 127.0.0.1 -port 5600 -norun engine_par -noloopback
stopx


Debugging Python Internallauncher