BlueGene Q

The decisions that lead to the current build process should be mostly captured in BlueGene Q Porting Notes. The basic approach is based on that for Building on BlueGeneP.

Overview

Building VisIt on BGQ requires that compute node binaries and login node binaries be created. To that end, we first build a compute-node engine-only build that builds the serial and parallel compute engines for the compute nodes. The compute node version uses the BlueGene xlC compilers and supports BlueGene-specific optimizations (-qhot,...). A login node ppc64 version is also built and it uses the GNU compilers and builds engine_ser, vcl, mdserver.

  • Server only builds are performed because OpenGL fails at runtime for the viewer and fixing OpenGL is not a priority
  • This means that client/server must be used

All programs in both build types were created as statically linked binaries with a minimal set of I/O libraries (Silo, HDF5, Mili) to keep executable size to a minimum. Static binaries were chosen because they can load faster than dynamically linked libraries. Previous work on BG/P with dynamic libraries had runtime penalties of 5 minutes or so. Static libraries result in far faster loading of the VisIt compute engine.

Since 2 builds are required, we make an installation directory, do 2 separate builds, and install the binaries into the common installation directory to ensure that the serial login node build overlays the parallel compute node build. The common installation directory is then tarred up and that forms the binary tarball for the BG/Q platform.

Instrumenting simulations with libsim is possible using static linking.

BG/Q

Here are some observations about BG/Q.

  • Cross-compilation is required for the compute node
  • Each node has 16 cores
  • Each core can have up to 4 hardware threads
  • Each node has 16Gb of memory, which is about 1Gb per process if you run 16 processes

build_visit_BGQ

Building 3rd party libraries is complex because of the nature of 3rd party build systems and how they respond to the cross-compilation required for BG/Q compute nodes. To combat this problem, a build_visit_BGQ script was created to orchestrate the 2-platform build process and hide any intricacies due to cross-compilation. The use of the build_visit_BGQ is meant to be similar in flavor to running the normal build_visit script in a console fashion.

Caveats:

  • build_visit must be in the same directory as build_visit_BGQ
  • At this time, the build_visit_BGQ script can create 3rd party libraries for the login and compute node platforms before continuing on to build VisIt itself
  • The build_visit modules that create config-sites, etc contain a small amount of some custom code that keys off of the BUILD_VISIT_BGQ environment variable when custom behavior is required for BG/Q.
  • At this time, the parallel settings created in the config-site file are hard-coded. This would affect other users building on BG/Q as they might need to tweak the parallel settings
  • Testing has been performed on rzuseq and vulcan. Sequoia should be the same or similar

When I built the 3rd party libraries for BG/Q, I ran this command (or similar):

./build_visit_BGQ --thirdparty-path /usr/gapps/visit/thirdparty_static/2.7.0 --group visit --no-visit --svn

visit-build-open

The build_visit_BGQ can build VisIt and its required 3rd party libraries for the login node and compute node architectures and then create a tarball of the resulting combined distribution. VisIt developers at LLNL typically use an alternate mechanism for building and installing VisIt on LLNL systems and that is the visit-build-open script. The visit-build-open script has been enhanced to do a 2-pass build that makes a BG/Q binary tarball that can be installed on LLNL systems such as vulcan.

Usage:

src/svn_bin/visit-dist visitsources
src/svn_bin/visit-build-open -none +rzuseq -d visitsources
  • You'll need to do this from the RZ so the ssh commands in visit-build-open will work.

Development

Since 2 builds are required to get a functional set of binaries and both must be installed to a common installation directory, it's likely that you'll want to use visit-build-open to do your development builds. You could also follow this procedure:

mkdir visit2_7_0.linux-ppc64
instdir=`pwd`/visit2_7_0.linux-ppc64
mkdir compute_node
mkdir login_node

# Compute node build
cd compute_node
co_trunk src
cd src
cmake -DVISIT_CONFIG_SITE:PATH=config-site/rzuseqlac2-compute.cmake -DVISIT_ENGINE_ONLY:BOOL=ON -DCMAKE_INSTALL_PREFIX:PATH=$instdir -DCMAKE_BUILD_TYPE:STRING=Release -DVISIT_INSTALL_THIRD_PARTY:BOOL=ON .
sh svn_bin/filter_link_commands_BGQ.sh
make install
cd ../..

# Login node build
cd login_node
co_trunk src
co_trunk data
cd src
cmake -DVISIT_CONFIG_SITE:PATH=config-site/rzuseqlac2-login.cmake -DVISIT_SERVER_COMPONENTS_ONLY:BOOL=ON -DCMAKE_INSTALL_PREFIX:PATH=$instdir -DCMAKE_BUILD_TYPE:STRING=Release -DVISIT_INSTALL_THIRD_PARTY:BOOL=ON . 
make install 
cd ../..

# Copy the customlauncher into the installation
cp compute_node/trunk/src/resources/hosts/llnl/customlauncher visit2_7_0.linux-ppc64/2.7.0/bin

# Create a tarball of the co-install
tar zxvf visit2_7_0.linux-ppc64-BGQ.tar.gz visit2_7_0.linux-ppc64
  • If you edit sources in either of the compute_node or login_node directories, you'll have to copy the binary to the co-install directory or run make install again before you run.
  • Be sure to copy the LLNL customlauncher into your 2.7.0/bin directory or the engine won't be able to connect back from the compute nodes to the viewer.

Status

  • build_visit_BGQ can be used to build a complete VisIt binary distribution for BG/Q.
  • VisIt build can be created for rzuseqlac2 using committed config-site files. This requires 2 builds for manual BG/Q development.
  • Host profiles for vulcan and rzuseq work.
  • VisIt 2.9.0 can connect client/server to rzuseq or vulcan and process datasets in the normal manner.

Mira

To build on Mira, I did not have the mpixlc compiler aliases set up because I had a brand new account with no customization. I had to create a ~/.soft file to help set up the environment. When I logged in again, my environment was a little better.

# ~/.soft
+mpiwrapper-xl
@default
#end of ~/.soft

I was able to take the visit2.9.0.tar.gz source code and the relevant build_visit_BGQ scripts and build through to create a combined installation in a linux-pp64-BGQ directory. From there, I was able to set up a host profile to mira.alcf.anl.gov that could connect to the login node and let me browse the file system. I did not try for parallel yet because it looks like VisIt does not supply a qsub/runjob option for Mira's Cobalt job launch setup.

Issues:

  • Need qsub/runjob support in internallauncher
  • Need to finish setting up mira host profile
  • Add detection of missing mpixlc program in build_visit_BGQ and recommend setting of ~/.soft file.
  • The static libsim runtime libraries (libsimV2_static_ser.a and libsimV2_static_par.a) are not getting installed into the VISITDIR/ARCH/libsim/V2/libBGQ directory. This should be fixed so it is easier to link with Libsim on BG/Q.

Unwanted X11

When using the VisIt libraries on another project where I'm linking against VTK or Libsim, I notice that cmake adds -lX11 and -lXext to my programs that use VTK. This is the same junk I filter out of the VisIt build. I located a source of those things in the installed VTK in the VTKTarges.cmake file.

Filter out X11 and Xext.

set_target_properties(vtkRenderingOpenGL PROPERTIES
  INTERFACE_LINK_LIBRARIES "vtkRenderingCore;$<LINK_ONLY:vtkImagingHybrid>;$<LINK_ONLY:vtksys>;/home/whitlock/Development/thirdparty_static/2.9.0/mesa/7.8.2/linux-ppc64_gcc-4.4_BGQ/lib/libGLU.a;/home/whitlock/Development/thirdparty_static/2.9.0/mesa/7.8.2/linux-ppc64_gcc-4.4_BGQ/lib/libOSMesa.a;/usr/lib64/libX11.so;/usr/lib64/libXext.so;/home/whitlock/Development/thirdparty_static/2.9.0/mesa/7.8.2/linux-ppc64_gcc-4.4_BGQ/lib/libOSMesa.a"
)

Then you have:

set_target_properties(vtkRenderingOpenGL PROPERTIES
  INTERFACE_LINK_LIBRARIES "vtkRenderingCore;$<LINK_ONLY:vtkImagingHybrid>;$<LINK_ONLY:vtksys>;/home/whitlock/Development/thirdparty_static/2.9.0/mesa/7.8.2/linux-ppc64_gcc-4.4_BGQ/lib/libGLU.a;/home/whitlock/Development/thirdparty_static/2.9.0/mesa/7.8.2/linux-ppc64_gcc-4.4_BGQ/lib/libOSMesa.a;/home/whitlock/Development/thirdparty_static/2.9.0/mesa/7.8.2/linux-ppc64_gcc-4.4_BGQ/lib/libOSMesa.a"
)

This step can probably be added to build_visit_BGQ during the installation of VTK for the compute nodes. That would prevent some of the stuff we have to do in the filter_link_commands_BGQ.sh script.

Linker Flags

There is still a problem with some of the linker flags generated by cmake. On mira, some programs linking against VTK will get a -rdynamic linker flag on the bgxlC_r command line. That's not even a valid flag for the bgxlC_r compiler. After examining the filter_link_commands_BGQ.sh script developed on vulcan at LLNL, there are similar linker flags that are filtered out: -Wl,-Bstatic, -Wl,-Bdynamic.

We still need to figure out how those flags get added to the cmake build so we can prevent it and have a better-behaved build that does not require and massaging.

Future Work

This is a list of possible future enhancements that would improve ease of VisIt development and running VisIt on BG/Q:

  • Test the build on other machines (mira/Argonne)
  • For compute engine builds, cmake always wants to link the engine with -lX11 and -lXext even though we've turned off X11 support and X11 does not exist. This is cmake problem. We address the issue by running a script after cmake, called filter_link_commands_BGQ.sh, that removes the X11 dependencies from the link line. Figure out how to do this better or transparently.
  • Enable VisIt's threaded build mode when it gets into the trunk. We'll need to run fewer processes per node and then divide up their domain into smaller domains that additional threads can process. That way we can use the 64 threads on the node hopefully without blowing memory.