Libsim on Stampede

In preparation for the IXPUG workshop at TACC, I decided to try building and running VisIt on TACC's Stampede cluster.

VisIt/Libsim Build (Stampede-1)

I built VisIt 2.12.2 server components with the Intel compiler using the command line:

./build_visit2_12_2 --cc icc --cxx icpc --thirdparty-path $HOME/Development/thirdparty_shared/2.12.0 \
--parallel --server-components-only --mesa --silo --hdf5 --szip --makeflags -j2 --system-python

From there, I made a VisIt package and installed it to $HOME/visit_intel_shared.

Testing Libsim

I tested the VisIt/Libsim installation using the batch example simulation from the VisIt sources. This example program computes a radial field and makes various slices and contours through it, saving extracts and images.

Files:

Makefile:

If someone wants to use my Libsim installation, replace $(HOME) with /home1/04893/whitlocb.

# The compilers you'll use for building.
CC=mpicc
CXX=mpicxx
CFLAGS=-O3

# Change LIBSIM_INCDIR, LIBSIM_LIBDIR to point to the include 
# and library directories for Libsim.
LIBSIM_INCDIR=$(HOME)/visit_intel_shared/2.12.2/linux-x86_64/libsim/V2/include
LIBSIM_LIBDIR=$(HOME)/visit_intel_shared/2.12.2/linux-x86_64/libsim/V2/lib
LIBSIM_LDFLAGS=-L$(LIBSIM_LIBDIR)
LIBSIM_LIBS=-lsimV2

CPPFLAGS=-I. -I$(LIBSIM_INCDIR) -DPARALLEL
LDFLAGS=$(LIBSIM_LDFLAGS) 
LIBS=$(LIBSIM_LIBS) -ldl -lpthread -lz -lm

SRC=batch.c extract.c
OBJ=$(SRC:.c=.o)

all: batch_par

batch_par: $(OBJ)
	$(CXX) -o batch_par $(OBJ) $(CFLAGS) $(LDFLAGS) $(LIBS)

clean:
	$(RM) $(OBJ) batch_par

.c.o:
	$(CC) $(CFLAGS) $(CPPFLAGS) -c $<

Batch Script:

#!/bin/bash
#----------------------------------------------------
# Example SLURM job script to run MPI applications on 
# TACC's Stampede system.
#
# $Id: job.mpi 1580 2013-01-08 04:10:50Z karl $
#----------------------------------------------------

#SBATCH -J batch_par          # Job name
#SBATCH -o batch_par.%j.out   # Name of stdout output file (%j expands to jobId)
#SBATCH -p development        # Queue name
#SBATCH -N 2                  # Total number of nodes requested (16 cores/node)
#SBATCH -n 32                 # Total number of mpi tasks requested
#SBATCH -t 00:30:00           # Run time (hh:mm:ss) 
#SBATCH -A InSitu_Workshop

# Set up VisIt environment
export VISITARCHHOME=$HOME/visit_intel_shared/2.12.2/linux-x86_64
export LD_LIBRARY_PATH=$VISITARCHHOME/lib:$VISITARCHHOME/lib/osmesa:$LD_LIBRARY_PATH
export VISITHOME=$HOME/visit_intel_shared/2.12.2
export VISITPLUGINDIR=$HOME/.visit/2.12.2/linux-x86_64/plugins:$HOME/visit_intel_shared/2.12.2/linux-
x86_64/plugins
export VISIT_MESA_LIB=$HOME/visit_intel_shared/2.12.2/linux-x86_64/lib/osmesa/libGL.so.1

# Start the example simulation
ibrun ./batch_par -dir $HOME/visit_intel_shared -domains 2,4,4 -maxcycles 10 -export 1 -render 1 -format VTK 

Issues:

  • I had to set up environment variables for VisIt in the batch submission script so Stampede's dlopen() would locate the VisIt/Libsim runtime libraries. Without setting up the environment, the example simulation would either not find Libsim, not find some of its dependencies, or crash doing rendering-related activities in startup.
  • I had to turn off the streamline exports computed by the batch program since it was causing crashes.

VisIt/Libsim Build (Stampede-KNL)

This partition of the Stampede machine contains many nodes that run Intel-KNL processors. These processors are different from typical Intel processors in that they provide a lot more cores with more vector instructions such as AVX512.

To log into these nodes, you must specify the KNL login node:

ssh login-knl1.stampede.tacc.utexas.edu

I'm trying to rebuild using the Intel compiler and some extra flags to activate code generation for AVX512. Since I'm still building on the compute node, which is a different CPU architecture, I figure the easiest thing is to compile for both the Haswell and KNL cores. This prevents me from having to cross-compile for the KNL-only, which is harder to do in CMake and for VisIt's dependencies.

./build_visit2_12_2 --cc icc --cxx icpc --cflags "-xCORE-AVX2 -axMIC-AVX512" --cxxflags "-xCORE-AVX2 -axMIC-AVX512" \
--thirdparty-path $HOME/Development/thirdparty_shared_knl/2.12.0 --parallel --server-components-only \
--mesa --silo --hdf5 --szip --makeflags -j2 --system-python

I installed this VisIt build to $HOME/visit_intel_knl_shared and built the batch_par example program against that VisIt installation. Then I ran it on the KNL nodes using the following SLURM script, which alters the number of cores to 68 for KNL (-domains argument also changed to example program):

Batch script for KNL nodes:

#!/bin/bash
#----------------------------------------------------
# Example SLURM job script to run MPI applications on 
# TACC's Stampede system.
#
# $Id: job.mpi 1580 2013-01-08 04:10:50Z karl $
#----------------------------------------------------

#SBATCH -J batch_par          # Job name
#SBATCH -o batch_par.%j.out       # Name of stdout output file (%j expands to jobId)
#SBATCH -p development        # Queue name
#SBATCH -N 1                  # Total number of nodes requested (16 cores/node)
#SBATCH -n 68                 # Total number of mpi tasks requested
#SBATCH -t 00:30:00           # Run time (hh:mm:ss) 
#SBATCH -A InSitu_Workshop

export VISITARCHHOME=$HOME/visit_intel_knl_shared/2.12.2/linux-x86_64
export LD_LIBRARY_PATH=$VISITARCHHOME/lib:$VISITARCHHOME/lib/osmesa:$LD_LIBRARY_PATH
export VISITHOME=$HOME/visit_intel_knl_shared/2.12.2
export VISITPLUGINDIR=$HOME/.visit/2.12.2/linux-x86_64/plugins:$HOME/visit_intel_knl_shared/2.12.2/linux-
x86_64/plugins
export VISIT_MESA_LIB=$HOME/visit_intel_knl_shared/2.12.2/linux-x86_64/lib/osmesa/libGL.so.1

ibrun ./batch_par -dir $HOME/visit_intel_knl_shared -domains 17,2,2 -maxcycles 10 -export 1 -render 1 -format VTK 

NOTE: The job made it through about 2.5 iterations instead of the requested 10 before it was cancelled due to exceeding its time limit.

VisIt/Libsim Build (Stampede-2)

I repeated the steps from Stampede-1's KNL partition on the new Stampede-2 machine. I installed the resulting VisIt build to ~whitlocb/visit_intel_knl_shared.

Here is the Makefile I used for my simple simulation example. The HOME variable corresponds to ~whitlocb.

# The compilers you'll use for building.
CC=mpicc
CXX=mpicxx
CFLAGS=-O3 -xMIC-AVX512

# Change LIBSIM_INCDIR, LIBSIM_LIBDIR to point to the include 
# and library directories for Libsim.
LIBSIM_INCDIR=$(HOME)/visit_intel_knl_shared/2.12.2/linux-x86_64/libsim/V2/include
LIBSIM_LIBDIR=$(HOME)/visit_intel_knl_shared/2.12.2/linux-x86_64/libsim/V2/lib
LIBSIM_LDFLAGS=-L$(LIBSIM_LIBDIR)
LIBSIM_LIBS=-lsimV2

CPPFLAGS=-I. -I$(LIBSIM_INCDIR) -DPARALLEL
LDFLAGS=$(LIBSIM_LDFLAGS) 
LIBS=$(LIBSIM_LIBS) -ldl -lpthread -lz -lm

SRC=batch.c extract.c
OBJ=$(SRC:.c=.o)

all: batch_par

batch_par: $(OBJ)
	$(CXX) -o batch_par $(OBJ) $(CFLAGS) $(LDFLAGS) $(LIBS)

clean:
	$(RM) $(OBJ) batch_par

.c.o:
	$(CC) $(CFLAGS) $(CPPFLAGS) -c $<

I tested with the same batch script. This time, the job completed in seconds. I got VTK files and PNG files. Woo hoo!