Socket Relay

VisIt is comprised of several executable programs, or components, that communicate over the network via sockets. This enables certain components such as the parallel compute engine to run on remote compute clusters. The socket relay program is useful in the event that communications from one VisIt component to another need to span network boundaries.

Typically, VisIt's component launcher (VCL) is launched on the remote computer and talks back to the local VisIt clients via an SSH tunnel. When other VisIt programs need to be launched, they are invoked via VCL. This includes the launch of parallel compute engines. When the parallel compute engine is launched, VCL will run the visit command with additional arguments that tell how to launch the engine in parallel. These arguments are interpreted by VisIt's launch script internallauncher and they are translated into system-specific commands that use the parallel computer's batch system to invoke the parallel job. When the job is ultimately run on the cluster's compute nodes, it may not have network access to the VCL on the login nodes. The socket relay program addresses the problem of relaying network traffic from a program on network A to a program on network B via a computer that both networks can see.

Building Socket Relay

You can enable the socket relay program in VisIt's build if you pass the put the following line in the config-site/host.cmake file that build_visit generated for you:

VISIT_OPTION_DEFAULT(VISIT_CREATE_SOCKET_RELAY_EXECUTABLE ON)

Cray

On Crays, the login node where VCL runs may simply be capable of submitting jobs and there is no direct connection to the compute nodes. For example, Crays have login nodes where VCL runs and then there are MOM nodes (intermediate nodes) that sit between the login nodes and the compute nodes. The login nodes may talk to the MOM nodes and the MOM nodes may talk to the compute nodes but there is no direct network connection available from login nodes to compute nodes. The socket relay program becomes useful because it can run on the MOM nodes and forward socket traffic in between the VCL on the login nodes and the parallel engine on the compute nodes.

Customizations

Crays generally require some site-specific customizations to VisIt's internallauncher script to allow the compute nodes to talk back to VCL through the socket relay program. Customizations have been made for NERSC's Hopper computer, which is a Cray, so the pattern established for Hopper can apply to other Cray computers.

If internallauncher detects running on Hopper (search for IsRunningOnHopperNERSC for finding the appropriate locations, it does the following:

  • It saves the $remotehost and $remoteport---hostname and port number of VisIt on the login node---in the variables $loginnodeport and $loginnodehost
  • It then replaces $remotehost with "\$MOM_HOST" and $remoteport with "\$MOM_PORT". This references two environment variables that will be later set in the launch script (they are not known yet, since the MOM node is not known until qsub has started the batch script).
  • It augments the batch script that starts VisIt with the following additions to the TFILE:
# Set MOM_HOST to the intermediate node on which the batch script runs
print TFILE "MOM_HOST=\`hostname\`\n";
# Run socket relay on intermediate node. This script writes the port number on which it listens to stdout. Save the port number to a file.
# Note that we are using the saved $loginnodehost and $loginnodeport to ensure that the relay connects to the correct login node.
print TFILE "$visitbindir/visit_socket_relay $loginnodehost $loginnodeport > $tfilename.port\n";
# Set the MOM_PORT environment variable to the port the socket relay is listening on
print TFILE "MOM_PORT=\`cat $tfilename.port\`\n";
# Cleanup
print TFILE "rm $tfilename.port\n";

# This is for Hopper according to documentation
# VisIt is not using xt-shmem, so unload it to avoid warnings (errors?)
print TFILE 'eval $(modulecmd sh unload xt-shmem)',"\n";
# Ensure that system libraries are visible on compute nodes
print TFILE "export CRAY_ROOTFS=DSL\n";

The rest is the regular VisIt launch. Since $remotehost and and $remoteport are replaced to use the environment variables, the existing internallauncher logic will correctly cause VisIt to connect to the socket relay.