BlueGene Q Porting Notes

From VisItusers.org

(Redirected from Really Large Tree)
Jump to: navigation, search

I'm building VisIt for sequoia on the rzuseq system, which is a smaller BG/Q development system. As with BG/P dawn, rzuseq consists of front end nodes and compute nodes, which run CNK. The hardware on the login nodes and the compute nodes is different so cross-compilation is still required to make binaries that run on the compute nodes.

When VisIt was ported to dawn, there was an issue with connecting the engine and the viewer. The problem was that the compute nodes had multiple ethernet interfaces and it was a matter of using the IP address for the proper interface to get the engine connecting to the viewer. See my dawn porting notes.

I had considered only building static engine-only builds for sequoia but then I remembered that I'll still need an mdserver and vcl in order to run client/server properly. This means that I need a full build of VisIt for the login nodes as well as a compute node version. For the compute node version, I still do plan to do a static engine-only build. I will try using xlC, though I was not successful using xlC on dawn for a shared build. Hopefully a static build will make things easier. I also expect that a static build would drastically reduce load times which, on dawn, were on the order of 5 minutes using shared libraries.


Contents

[edit] Compilers

There seem to be a lot of options for using the xlC compiler. The recommended xlC comes via an mpicxx:

rzuseqlac2{whitlocb}75: /bgsys/drivers/ppcfloor/comm/xl/bin/mpixlcxx_r -show
/opt/ibmcmp/vacpp/bg/12.1/bin/bgxlC_r -I/bgsys/drivers/V1R1M2/ppc64/comm/sys/include -I/bgsys/drivers/V1R1M2/ppc64 -I/bgsys/drivers/V1R1M2/ppc64/spi/include -I/bgsys/drivers/V1R1M2/ppc64/spi/include/kernel/cnk -I/bgsys/drivers/V1R1M2/ppc64/comm/xl/include -L/bgsys/drivers/V1R1M2/ppc64/comm/xl/lib -lcxxmpich -lmpich -lopa -lmpl -L/bgsys/drivers/V1R1M2/ppc64/comm/sys/lib -lpami -L/bgsys/drivers/V1R1M2/ppc64/spi/lib -lSPI -lSPI_cnk -lpthread -lrt -lstdc++

It looks like I can give this as the compiler name for the compute node:

/opt/ibmcmp/vacpp/bg/12.1/bin/bgxlC_r

There are a lot of options in this directory too:

/usr/local/tools/compilers/ibm/


[edit] Oct 15, 2012

[edit] cmake

I need to have a working cmake for the login nodes.

build_visit --console --static --no-visit --no-thirdparty --thirdparty-path /g/g19/whitlocb/Development/seq/thirdparty_static --cmake --no-python --cc xlc --cxx xlC


[edit] Building login node 3rd party

I'm completely blanking on what I need to do for compute nodes builds so let's first get started building a login node version that will give me a client. For this step, I'm going to just build using g++ so I don't run into issues. This g++ client will communicate via sockets to the xlC back end.

./build_visit --console --thirdparty-path /g/g19/whitlocb/Development/seq/thirdparty_shared --no-visit --mesa --hdf5 --szip --silo --netcdf --cgns

[edit] Oct 16, 2012

After invoking the build_visit call, VTK fails to build due to this error:

[ 57%] Building CXX object IO/CMakeFiles/vtkIO.dir/vtkIOInstantiator.cxx.o
Linking CXX shared library ../bin/libvtkIO.so
/usr/bin/ld: CMakeFiles/vtkIO.dir/vtkXMLParser.cxx.o(.text+0x1cf4): sibling call optimization to `vtkXMLParser::SetEncoding(char const*)' does not allow automatic multiple TOCs; recompile with -mminimal-toc or -fno-optimize-sibling-calls, or make `vtkXMLParser::SetEncoding(char const*)' extern
/usr/bin/ld: final link failed: Bad value
collect2: ld returned 1 exit status
make[2]: *** [bin/libvtkIO.so.5.8.0] Error 1
make[1]: *** [IO/CMakeFiles/vtkIO.dir/all] Error 2
make: *** [all] Error 2

I reran build_visit, adding the recommended argument.

./build_visit --console --thirdparty-path /g/g19/whitlocb/Development/seq/thirdparty_shared --no-visit --mesa --hdf5 --szip --silo --netcdf --cgns --cxxflag -fno-optimize-sibling-calls

This did not work because the -fno-optimize-sibling-calls argument was never passed to g++. This looks like a build_visit bug with --cxxflag.

I ended up running the cmake command line from build_visit_log directly in the visit-vtk-5.8-build directory so I could pass the -fno-optimize-sibling-calls argument. This did end up letting VTK build, after which I ran "make install".

/g/g19/whitlocb/Development/seq/thirdparty_shared/cmake/2.8.8/linux-ppc64_gcc-4.4/bin/cmake -DCMAKE_BUILD_TYPE:STRING=Release -DVTK_DEBUG_LEAKS:BOOL=OFF -DBUILD_SHARED_LIBS:BOOL=ON -DCMAKE_INSTALL_PREFIX:PATH=/g/g19/whitlocb/Development/seq/thirdparty_shared/vtk/5.8.0/linux-ppc64_gcc-4.4 -DVTK_INSTALL_INCLUDE_DIR:PATH=/include/ -DVTK_INSTALL_LIB_DIR:PATH=/lib/ -DBUILD_TESTING:BOOL=OFF -DBUILD_DOCUMENTATION:BOOL=OFF -DVTK_USE_NETCDF:BOOL=OFF -DVTK_USE_EXODUS:BOOL=OFF -DVTK_USE_TK:BOOL=OFF -DVTK_USE_64BIT_IDS:BOOL=ON -DVTK_USE_INFOVIS:BOOL=ON -DVTK_USE_CHARTS:BOOL=OFF -DVTK_USE_METAIO:BOOL=OFF -DVTK_USE_PARALLEL:BOOL=OFF -DVTK_LEGACY_REMOVE:BOOL=ON -DVTK_USE_SYSTEM_JPEG:BOOL=OFF -DVTK_USE_SYSTEM_PNG:BOOL=OFF -DVTK_USE_SYSTEM_TIFF:BOOL=OFF -DVTK_USE_SYSTEM_ZLIB:BOOL=OFF -DVTK_USE_SYSTEM_HDF5:BOOL=ON -DHDF5_C_INCLUDE_DIR:PATH=/include -DHDF5_INCLUDE_DIR:PATH=/include -DHDF5_DIFF_EXECUTABLE:FILEPATH=/bin/hdf5diff -DHDF5_LIBRARY=/lib/libhdf5 -DHDF5_hdf5_LIBRARY:FILEPATH=/lib/libhdf5 -DHDF5_hdf5_LIBRARY_RELEASE:FILEPATH=/lib/libhdf5 -DHDF5_hdf5_hl_LIBRARY:FILEPATH=/lib/libhdf5_hl -DHDF5_hdf5_hl_LIBRARY_RELEASE:FILEPATH=/lib/libhdf5_hl -DVTK_USE_CARBON:BOOL=OFF -DVTK_USE_ANSI_STD_LIB:BOOL=ON -DCMAKE_C_COMPILER:STRING=gcc -DCMAKE_CXX_COMPILER:STRING=g++ -DCMAKE_C_FLAGS:STRING= -fPIC -DCMAKE_CXX_FLAGS:STRING="-fno-optimize-sibling-calls -fPIC" -DCMAKE_EXE_LINKER_FLAGS:STRING= -DCMAKE_MODULE_LINKER_FLAGS:STRING= -DCMAKE_SHARED_LINKER_FLAGS:STRING= -DVTK_USE_MANGLED_MESA:BOOL=OFF -DVTK_OPENGL_HAS_OSMESA:BOOL=ON -DOSMESA_INCLUDE_DIR:PATH=/g/g19/whitlocb/Development/seq/thirdparty_shared/mesa/7.8.2/linux-ppc64_gcc-4.4/include -DOSMESA_LIBRARY:FILEPATH=/g/g19/whitlocb/Development/seq/thirdparty_shared/mesa/7.8.2/linux-ppc64_gcc-4.4/lib/libOSMesa.so -DVTK_WRAP_PYTHON:BOOL=ON -DPYTHON_EXECUTABLE:FILEPATH=/g/g19/whitlocb/Development/seq/thirdparty_shared/python/2.6.4/linux-ppc64_gcc-4.4/bin/python -DPYTHON_INCLUDE_DIR:PATH=/g/g19/whitlocb/Development/seq/thirdparty_shared/python/2.6.4/linux-ppc64_gcc-4.4/include/python2.6 -DPYTHON_LIBRARY:FILEPATH=/g/g19/whitlocb/Development/seq/thirdparty_shared/python/2.6.4/linux-ppc64_gcc-4.4/lib/libpython2.6.so -DPYTHON_EXTRA_LIBS:STRING=-lpthread -ldl -lutil -lm ../visit-vtk-5.8

[edit] Feb 13, 2013

Restarting this project after a long break.

./build_visit --console --thirdparty-path /g/g19/whitlocb/Development/seq/thirdparty_shared --no-visit --mesa --hdf5 --szip --silo --netcdf --cgns

I again run into this problem when building Qt:

/usr/bin/ld: .obj/release-shared/qximinputcontext_x11.o(.text+0x3470): sibling call optimization to `QList<QInputMethodEvent::Attribute>::free(QListData::Data*)' does not allow automatic multiple TOCs; recompile with -mminimal-toc or -fno-optimize-sibling-calls, or make `QList<QInputMethodEvent::Attribute>::free(QListData::Data*)' extern
/usr/bin/ld: .obj/release-shared/qximinputcontext_x11.o(.text+0x3488): sibling call optimization to `QString::~QString()' does not allow automatic multiple TOCs; recompile with -mminimal-toc or -fno-optimize-sibling-calls, or make `QString::~QString()' extern
/usr/bin/ld: final link failed: Bad value
collect2: ld returned 1 exit status
make[1]: *** [../../lib/libQtGui.so.4.8.3] Error 1

Trying to see if this works:

./build_visit --console --thirdparty-path /g/g19/whitlocb/Development/seq/thirdparty_shared --no-visit --mesa --hdf5 --szip --silo --netcdf --cgns --cxxflag -fno-optimize-sibling-calls

It did not work!!!

I changed bv_qt.sh to put "env CXXFLAGS=-mminimal-toc" before Qt's configure and that let Qt build.

I ran into the same problem halfway through the VTK build so I added -mminimal-toc to the CXXFLAGS in bv_main.sh for linux-ppc64.

I ran into the same problem in PySide so I fixed it.

CCMIO failed due to a patch. I'm not fixing it.

I eventually built through most of the I/O libraries with the following command line:

./build_visit --console --thirdparty-path /g/g19/whitlocb/Development/seq/thirdparty_shared --no-visit --mesa --hdf5 --szip --silo --netcdf --cgns --mili --gdal --xdmf --boxlib --cfitsio --exodus --hdf4 --advio --makeflags -j4

I can probably reduce the libraries that I ultimately build for the compute engine version.

I was able to build through VisIt and I get a runtime error:

(gdb) where
#0  0x0000008053e35568 in .XQueryExtension () from /usr/lib64/libX11.so.6
#1  0x0000008053e25e30 in .XInitExtension () from /usr/lib64/libX11.so.6
#2  0x000000805737c0f4 in ?? () from /usr/lib64/libGL.so.1
#3  0x0000008057377790 in .glXQueryVersion () from /usr/lib64/libGL.so.1
#4  0x00000400014873ec in .glxewContextInit ()
   from /g/g19/whitlocb/Development/seq/do_visit/lib/libvisitGLEW.so
#5  0x000004000149d014 in .glewInitLibrary ()
   from /g/g19/whitlocb/Development/seq/do_visit/lib/libvisitGLEW.so
#6  0x00000400008a7258 in ._ZN3avt4glew10initializeEb ()
   from /g/g19/whitlocb/Development/seq/do_visit/lib/libavtplotter_ser.so
#7  0x00000400006d0a58 in ._ZN15VisWinRendering7RealizeEv ()
   from /g/g19/whitlocb/Development/seq/do_visit/lib/libavtviswindow_ser.so
#8  0x00000400007205b4 in ._ZN9VisWindow7RealizeEv ()
   from /g/g19/whitlocb/Development/seq/do_visit/lib/libavtviswindow_ser.so
#9  0x00000400001ef4f4 in ._ZN12ViewerWindow7RealizeEv ()

I tried Qt's hellogl and 2dpainting example programs and they were both unable to create GL contexts.

I ran a VTK Python example and it also is unable to create an OpenGL window.

rzuseqlac2{whitlocb}260: env LD_LIBRARY_PATH=/g/g19/whitlocb/Development/seq/thirdparty_shared/vtk/5.8.0.a/linux-ppc64_gcc-4.4/lib:/g/g19/whitlocb/Development/seq/thirdparty_shared/mesa/7.8.2/linux-ppc64_gcc-4.4/lib  /g/g19/whitlocb/Development/seq/thirdparty_shared/python/2.6.4/linux-ppc64_gcc-4.4/bin/python
Python 2.6.4 (r264:75706, Feb 13 2013, 11:28:55) 
[GCC 4.4.6 20120305 (Red Hat 4.4.6-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path.append("/g/g19/whitlocb/Development/seq/thirdparty_shared/vtk/5.8.0.a/linux-ppc64_gcc-4.4/lib/python2.6/site-packages")
>>> import vtk
>>> import Cylinder
ERROR: In /g/g19/whitlocb/Development/seq/builds/visit-vtk-5.8.0.a/Rendering/vtkXOpenGLRenderWindow.cxx, line 405
vtkXOpenGLRenderWindow (0x1051f1e0): Could not find a decent visual


ERROR: In /g/g19/whitlocb/Development/seq/builds/visit-vtk-5.8.0.a/Rendering/vtkXOpenGLRenderWindow.cxx, line 405
vtkXOpenGLRenderWindow (0x1051f1e0): Could not find a decent visual


ERROR: In /g/g19/whitlocb/Development/seq/builds/visit-vtk-5.8.0.a/Rendering/vtkXOpenGLRenderWindow.cxx, line 405
vtkXOpenGLRenderWindow (0x1051f1e0): Could not find a decent visual


X Error of failed request:  BadValue (integer parameter out of range for operation)
  Major opcode of failed request:  149 (GLX)
  Minor opcode of failed request:  3 (X_GLXCreateContext)
  Value in failed request:  0x22
  Serial number of failed request:  36
  Current serial number in output stream:  36

[edit] Feb 14, 2013

Talked with Becky & Rich. They said that since the arch is ppc64, they can't do working nvidia drivers. They'd be willing to try mesa but that's probably what is already there. If compilation was faster, I'd try myself.

I'm going to forge ahead with a server-components-only build and a compute engine build. I guess I can make them both be static.

[edit] Mar 25, 2013

Okay, another long break. I'm going to run build_visit to make a static server-components-only build.

./build_visit --console --thirdparty-path /g/g19/whitlocb/Development/seq/thirdparty_static --no-visit --mesa --hdf5 --szip --silo --netcdf --mili --xdmf --static --server-components-only

I found that the generated config-site file had references to Qt and PySide even though I did not build those packages. I removed those libraries from the config-site.

mkdir do_visit_static 
cd  do_visit_static
../thirdparty_static/cmake/2.8.8/linux-ppc64_gcc-4.4/bin/cmake -DCMAKE_BUILD_TYPE:STRING=Debug ../seq26/src

CMake fails.

-- Looking for MESA
CMake Error at CMake/SetUpThirdParty.cmake:190 (MESSAGE):
  Library OSMesa not found in
  /g/g19/whitlocb/Development/seq/thirdparty_static/mesa/7.8.2/linux-ppc64_gcc-4.4/lib
Call Stack (most recent call first):
  CMake/FindVisItMesa.cmake:60 (SET_UP_THIRD_PARTY)
  CMakeLists.txt:906 (INCLUDE)

-- Configuring incomplete, errors occurred!
rzuseqlac2{whitlocb}172: pwd
/g/g19/whitlocb/Development/seq/do_visit_static
rzuseqlac2{whitlocb}173: ls /g/g19/whitlocb/Development/seq/thirdparty_static/mesa/7.8.2/linux-ppc64_gcc-4.4/lib
libGL.a  libGLU.a  pkgconfig

Hm. Apparently I had edited bv_mesa.sh a lot so that we can have different Mesa build modes. I added --mesa-mangled and --mesa-osmesa. I'll try and remember what they mean.

  • --mesa Build Mesa with glX support. The resulting library is libGL.a and can be used as OpenGL.
  • --mesa-mangled Turn on mangling for Mesa
  • --mesa-osmesa Turn on OSMesa instead of glX for Mesa


BAH! Since I'm building --server-components-only right now, I'll rebuild with --mesa-mangled. I would build with no mesa at all but I'm planning on having the serial engine run on the login node so offscreen rendering would be nice. Later when I build the compute node version, I'll probably build with --mesa or --mesa-osmesa and use that for VTK's OpenGL.

 ./build_visit --console --thirdparty-path /g/g19/whitlocb/Development/seq/thirdparty_static --no-visit --mesa-mangled --hdf5 --szip --silo --netcdf --mili --xdmf --static --server-components-only

That failed too!

Building VTK . . . (~20 minutes)
VTK did not build correctly.  Giving up.
Error in build process.  See build_visit_log for more information. If the error is unclear, please include build_visit_log in a  message to the visit-users@ornl.gov list.  You will probably  need to compress the build_visit_log using a program like gzip  so it will fit within the size limits for email attachments.
rzuseqlac2{whitlocb}189: tail build_visit_log 
/g/g19/whitlocb/Development/seq/builds_static/visit-vtk-5.8.0.a/Rendering/vtkXOpenGLRenderWindow.cxx:1332: error: 'class vtkXOpenGLRenderWindowInternal' has no member named 'OffScreenContextId'
/g/g19/whitlocb/Development/seq/builds_static/visit-vtk-5.8.0.a/Rendering/vtkXOpenGLRenderWindow.cxx:1334: error: 'class vtkXOpenGLRenderWindowInternal' has no member named 'OffScreenContextId'
/g/g19/whitlocb/Development/seq/builds_static/visit-vtk-5.8.0.a/Rendering/vtkXOpenGLRenderWindow.cxx:1336: error: 'OSMesaMakeCurrent' was not declared in this scope
/g/g19/whitlocb/Development/seq/builds_static/visit-vtk-5.8.0.a/Rendering/vtkXOpenGLRenderWindow.cxx: In member function 'virtual bool vtkXOpenGLRenderWindow::IsCurrent()':
/g/g19/whitlocb/Development/seq/builds_static/visit-vtk-5.8.0.a/Rendering/vtkXOpenGLRenderWindow.cxx:1387: error: 'class vtkXOpenGLRenderWindowInternal' has no member named 'OffScreenContextId'
/g/g19/whitlocb/Development/seq/builds_static/visit-vtk-5.8.0.a/Rendering/vtkXOpenGLRenderWindow.cxx:1389: error: 'class vtkXOpenGLRenderWindowInternal' has no member named 'OffScreenContextId'
/g/g19/whitlocb/Development/seq/builds_static/visit-vtk-5.8.0.a/Rendering/vtkXOpenGLRenderWindow.cxx:1389: error: 'OSMesaGetCurrentContext' was not declared in this scope
make[2]: *** [Rendering/CMakeFiles/vtkRendering.dir/vtkXOpenGLRenderWindow.cxx.o] Error 1
make[1]: *** [Rendering/CMakeFiles/vtkRendering.dir/all] Error 2

[edit] Mar 26, 2013

I'm going to just try building Mesa and VTK by hand for now. I will use Mesa 8.0.5 since that's the last version to have easy "make target" support. The newest versions have configure but also require Python+libxml2, which may be another wrinkle.

[edit] Mesa

I untarred MesaLib-8.0.5.tar.gz and edited src/configs/bluegene-xlc-osmesa so its compilers were set to:

CC=bgxlc
CXX=bgxlC

Then I ran

make bluegene-xlc-osmesa

The build went on for a while and failed with an error:

bgxlc -c -o main/querymatrix.o main/querymatrix.c   -I../../include -I../../src/glsl -I../../src/mesa -I../../src/mapi -I../../src/gallium/include -I../../src/gallium/auxiliary  -O3 -pedantic -D_POSIX_SOURCE -D_POSIX_C_SOURCE=199309L -D_SVID_SOURCE -D_BSD_SOURCE
"main/querymatrix.c", line 83.1: 1506-275 (S) Unexpected text 'sizeof' encountered.
"main/querymatrix.c", line 83.12: 1506-275 (S) Unexpected text 'double' encountered.
"main/querymatrix.c", line 83.1: 1506-276 (S) Syntax error: possible missing ')'?
"main/querymatrix.c", line 82.7: 1506-334 (S) Identifier FP_NAN has already been defined on line 200 of "/bgsys/drivers/toolchain/V1R2M0/gnu-linux/lib/gcc/powerpc64-bgq-linux/4.4.6/../../../../powerpc64-bgq-linux/sys-include/math.h".
"main/querymatrix.c", line 82.15: 1506-334 (S) Identifier FP_INFINITE has already been defined on line 202 of "/bgsys/drivers/toolchain/V1R2M0/gnu-linux/lib/gcc/powerpc64-bgq-linux/4.4.6/../../../../powerpc64-bgq-linux/sys-include/math.h".
"main/querymatrix.c", line 82.28: 1506-334 (S) Identifier FP_ZERO has already been defined on line 204 of "/bgsys/drivers/toolchain/V1R2M0/gnu-linux/lib/gcc/powerpc64-bgq-linux/4.4.6/../../../../powerpc64-bgq-linux/sys-include/math.h".
"main/querymatrix.c", line 82.37: 1506-334 (S) Identifier FP_SUBNORMAL has already been defined on line 206 of "/bgsys/drivers/toolchain/V1R2M0/gnu-linux/lib/gcc/powerpc64-bgq-linux/4.4.6/../../../../powerpc64-bgq-linux/sys-include/math.h".
"main/querymatrix.c", line 82.51: 1506-334 (S) Identifier FP_NORMAL has already been defined on line 208 of "/bgsys/drivers/toolchain/V1R2M0/gnu-linux/lib/gcc/powerpc64-bgq-linux/4.4.6/../../../../powerpc64-bgq-linux/sys-include/math.h".
"main/querymatrix.c", line 83.19: 1506-166 (S) Definition of function x requires parentheses.
make[3]: *** [main/querymatrix.o] Error 1
make[3]: Leaving directory `/g/g19/whitlocb/Development/seq/builds_static/Mesa-8.0.5/src/mesa'
make[2]: *** [subdirs] Error 1
make[2]: Leaving directory `/g/g19/whitlocb/Development/seq/builds_static/Mesa-8.0.5/src'
make[1]: *** [default] Error 1
make[1]: Leaving directory `/g/g19/whitlocb/Development/seq/builds_static/Mesa-8.0.5'
make: *** [bluegene-xlc-osmesa] Error 2

I edited src/mesa/main/querymatrix.c and "if 0'd" out the definition of the fpclassify function at line 82.

#if 0
enum {FP_NAN, FP_INFINITE, FP_ZERO, FP_SUBNORMAL, FP_NORMAL}
fpclassify(double x)
{
   /* XXX do something better someday */
   return bgFP_NORMAL;
}
#endif

Apparently the bluegene build does not work.

make[2]: *** No rule to make target `../../src/glsl/libglsl.a', needed by `libmesa.a'.  Stop.
make[2]: Leaving directory `/g/g19/whitlocb/Development/seq/builds_static/Mesa-8.0.5/src/mesa'
make[1]: *** [subdirs] Error 1
make[1]: Leaving directory `/g/g19/whitlocb/Development/seq/builds_static/Mesa-8.0.5/src'
make: *** [default] Error 1

I looked at the config file for linux-osmesa and compared against the bluegene-xlc-osmesa config. I added "glsl" to the SRC_DIRS variable.

SRC_DIRS = glsl mesa glu

After that change, I run into:

python -t -O -O builtins/tools/generate_builtins.py ./builtin_compiler > builtin_function.cpp || rm -f builtin_function.cpp
bgxlC -c -I. -I../mesa -I../mapi -I../../include  -O3 -pedantic -D_POSIX_SOURCE -D_POSIX_C_SOURCE=199309L -D_SVID_SOURCE -D_BSD_SOURCE   builtin_function.cpp -o builtin_function.o
"builtin_function.cpp", line 6935.2: 1540-0859 (S) #error directive: builtins profile 100_frag failed to compile.
"builtin_function.cpp", line 6936.2: 1540-0859 (S) #error directive: builtins profile 100_vert failed to compile.
"builtin_function.cpp", line 6937.2: 1540-0859 (S) #error directive: builtins profile 110_frag failed to compile.
"builtin_function.cpp", line 6938.2: 1540-0859 (S) #error directive: builtins profile 110_vert failed to compile.
"builtin_function.cpp", line 6939.2: 1540-0859 (S) #error directive: builtins profile 120_frag failed to compile.
"builtin_function.cpp", line 6940.2: 1540-0859 (S) #error directive: builtins profile 120_vert failed to compile.
"builtin_function.cpp", line 6941.2: 1540-0859 (S) #error directive: builtins profile 130_frag failed to compile.
"builtin_function.cpp", line 6942.2: 1540-0859 (S) #error directive: builtins profile 130_vert failed to compile.
"builtin_function.cpp", line 6943.2: 1540-0859 (S) #error directive: builtins profile ARB_shader_texture_lod_frag failed to compile.
"builtin_function.cpp", line 6944.2: 1540-0859 (S) #error directive: builtins profile ARB_shader_texture_lod_vert failed to compile.
"builtin_function.cpp", line 6945.2: 1540-0859 (S) #error directive: builtins profile ARB_texture_rectangle_frag failed to compile.
"builtin_function.cpp", line 6946.2: 1540-0859 (S) #error directive: builtins profile ARB_texture_rectangle_vert failed to compile.
"builtin_function.cpp", line 6947.2: 1540-0859 (S) #error directive: builtins profile EXT_texture_array_frag failed to compile.
"builtin_function.cpp", line 6948.2: 1540-0859 (S) #error directive: builtins profile EXT_texture_array_vert failed to compile.
"builtin_function.cpp", line 6949.2: 1540-0859 (S) #error directive: builtins profile OES_EGL_image_external_frag failed to compile.
"builtin_function.cpp", line 6950.2: 1540-0859 (S) #error directive: builtins profile OES_EGL_image_external_vert failed to compile.
"builtin_function.cpp", line 6951.2: 1540-0859 (S) #error directive: builtins profile OES_texture_3D_frag failed to compile.
"builtin_function.cpp", line 6952.2: 1540-0859 (S) #error directive: builtins profile OES_texture_3D_vert failed to compile.
make[2]: *** [builtin_function.o] Error 1
make[2]: Leaving directory `/g/g19/whitlocb/Development/seq/builds_static/Mesa-8.0.5/src/glsl'
make[1]: *** [subdirs] Error 1
make[1]: Leaving directory `/g/g19/whitlocb/Development/seq/builds_static/Mesa-8.0.5/src'
make: *** [default] Error 1

I've come to the conclusion that using xlC-based compilers with Mesa is not well-supported and that I should instead use the gcc compiler wrappers.

[edit] Mesa 7.8.2

Here is a config file that works:

# Configuration for building only libOSMesa on BlueGene using the IBM xlc compiler
# This doesn't really have a lot of dependencies, so it should be usable
# on similar systems too.
# It uses static linking and disables multithreading.

include $(TOP)/configs/default

CONFIG_NAME = bluegene-osmesa

# Compiler and flags
APP_CC = gcc
APP_CXX = g++

CC=powerpc64-bgq-linux-gcc
CXX=powerpc64-bgq-linux-g++

CFLAGS = -O3 -pedantic -D_POSIX_SOURCE -D_POSIX_C_SOURCE=199309L -D_SVID_SOURCE -D_BSD_SOURCE
CXXFLAGS = -O3 -pedantic -D_POSIX_SOURCE -D_POSIX_C_SOURCE=199309L -D_SVID_SOURCE -D_BSD_SOURCE

MKLIB_OPTIONS = -static

GL_LIB_NAME = libGL.a
GLU_LIB_NAME = libGLU.a
GLUT_LIB_NAME = libglut.a
GLW_LIB_NAME = libGLw.a
OSMESA_LIB_NAME = libOSMesa.a

# Directories
SRC_DIRS = glsl mesa glu
DRIVER_DIRS = osmesa
PROGRAM_DIRS = osdemos

# Dependencies
GL_LIB_DEPS =
OSMESA_LIB_DEPS = -lm
GLU_LIB_DEPS = -L$(TOP)/$(LIB_DIR) -l$(OSMESA_LIB)
GLUT_LIB_DEPS =
GLW_LIB_DEPS =
APP_LIB_DEPS = -lOSMesa -lGLU -lm

[edit] Mesa 8.0.5

Here is a config file that works:

include $(TOP)/configs/default

CONFIG_NAME = bluegene-osmesa

# Compiler and flags
APP_CC = gcc
APP_CXX = g++

CC=powerpc64-bgq-linux-gcc
CXX=powerpc64-bgq-linux-g++

CFLAGS = -O3 -std=c99 -D_POSIX_SOURCE -D_POSIX_C_SOURCE=199309L -D_SVID_SOURCE -D_BSD_SOURCE
CXXFLAGS = -O3 -D_POSIX_SOURCE -D_POSIX_C_SOURCE=199309L -D_SVID_SOURCE -D_BSD_SOURCE

MKLIB_OPTIONS = -static

GL_LIB_NAME = libGL.a
GLU_LIB_NAME = libGLU.a
GLUT_LIB_NAME = libglut.a
GLW_LIB_NAME = libGLw.a
OSMESA_LIB_NAME = libOSMesa.a

# Directories
SRC_DIRS = glsl mapi/glapi mesa glu
DRIVER_DIRS = osmesa
PROGRAM_DIRS = osdemos

# Dependencies
GL_LIB_DEPS =
OSMESA_LIB_DEPS = -lm
GLU_LIB_DEPS = -L$(TOP)/$(LIB_DIR) -l$(OSMESA_LIB)
GLUT_LIB_DEPS =
GLW_LIB_DEPS =
APP_LIB_DEPS = -lOSMesa -lGLU -lm

[edit] Mar 29, 2013

I realized that what I want is an overarching build_visit_BGQ that will hide the complexities of multiple build_visit invokations with different compilers, etc. In the end build_visit_BGQ will create a visit2_7_0.linux-ppc64.tar.gz tarball that will contain statically linked binaries for the server components. The script will be very similar in nature to build_visit and will handle the extra level of BGQ build complexity that arises from creating 2 builds.

I've started build_visit_BGW on my seq26 branch though I'm using the build_visit and VisIt sources from the VTK-6 branch. The thought here is that the VTK port is well underway and it makes sense to solve any VTK porting issues once rather than porting to 5.8 and then again to 6.0.

[edit] Apr 11, 2013

I've been back at it this week. I've made good progress on the build_visit_BGQ script. It can now build login node and compute node versions of the desired packages. Currently, it builds:

  • cmake (login node)
  • Mesa (compute node, unmangled)
  • zlib (compute node, bgxlc)
  • szip (compute node, login node)
  • hdf5 (compute node, login node)
  • silo (compute node, login node)
  • VTK (compute node bgxlC, login node)

I ended up having to treat more and more of the compute node builds specially since their autoconf-based build systems have difficulties building with the bgxlc/bgxlC compilers. For those, I build them with the GNU wrapper compilers.

Since I can now build through this minimal set of libraries, I decided to tackle some VisIt build issues.

  • VisIt can now be built without Python
  • Static build issues are fixed

Now, I'm running into legitimate VisIt build issues:

  • I'm trying to use OpenGL instead of GLEW
  • I'm turning off any X11 code
  • I made it use the builtin boost if we don't use boost.

I've added some cmake options:

  • VISIT_USE_X
  • VISIT_USE_BOOST
  • VISIT_USE_GLEW
  • VISIT_OPENGL_DIR

I'm building VisIt manually until I can get through a build. Then I'll improve build_visit_BGQ and build_visit to pass the right cmake arguments for the compute node build. I'll use an environment variable as I did for VTK and Silo. Here is my current cmake command line:

cmake -DVISIT_ENGINE_ONLY:BOOL=ON -DVISIT_USE_X:BOOL=OFF -DVISIT_USE_GLEW:BOOL=OFF -DVISIT_SLIVR:BOOL=OFF \
      -DVISIT_USE_BOOST:BOOL=OFF \
      -DVISIT_OPENGL_DIR=/g/g19/whitlocb/Development/seq/test_build_visit/thirdparty_static/mesa/7.8.2/linux-ppc64_gcc-4.4_BGQ \
      -DVISIT_DISABLE_SELECT:BOOL=ON -DVISIT_USE_NOSPIN_BCAST:BOOL=OFF  ../../../VTK-6/src

Latest problem:

  • GLSL code in the Streamline plot. GLEW must be needed to use glUseProgram, etc.

[edit] Apr 12, 2013

I changed avtGLEWInitialze.h so it defines GL_GLEXT_PROTOTYPES before including GL.h and the problems with glUseProgram went away. I then reenabled the Streamline plot. I made a wrapper function for glewIsSupported() called avt::glew:supported to prevent us from calling GLEW functions directly.

I made some minor fixes:

  • BOV reader
  • VisItXdmf reader

When I got through a build of the .o's, the link for the engine failed. First was due to -lvtkzlib, and -lX11 being used. I think that the vtklibpng library is introducing the vtkzlib dependency even though I told VTK to build against my "system" zlib that I built myself. LibGLU or VTK might be introducing the -lX11 dependency.

I hand-removed those libraries from the linkflags.txt file and then I got HDF5 link errors and a problem with XDisplay that I'll need to fix in the engine.

Linking CXX executable ../../exe/engine_ser
../../lib/libengine_ser.a(Engine.C.o):(.text+0xbb74): undefined reference to `XDisplay::Launch(bool)'
../../lib/libengine_ser.a(VisItDisplay.C.o):(.text+0xc08): undefined reference to `XDisplay::XDisplay()'
/g/g19/whitlocb/Development/seq/test_build_visit/thirdparty_static/silo/4.9.1/linux-ppc64_gcc-4.4_BGQ/lib/libsiloh5.a(silo_hdf5.o):(.text+0x95a0): undefined reference to `H5Topen1'
/g/g19/whitlocb/Development/seq/test_build_visit/thirdparty_static/silo/4.9.1/linux-ppc64_gcc-4.4_BGQ/lib/libsiloh5.a(silo_hdf5.o):(.text+0x98fc): undefined reference to `H5Topen1'
/g/g19/whitlocb/Development/seq/test_build_visit/thirdparty_static/silo/4.9.1/linux-ppc64_gcc-4.4_BGQ/lib/libsiloh5.a(silo_hdf5.o):(.text+0x9c04): undefined reference to `H5Topen1'
/g/g19/whitlocb/Development/seq/test_build_visit/thirdparty_static/silo/4.9.1/linux-ppc64_gcc-4.4_BGQ/lib/libsiloh5.a(silo_hdf5.o):(.text+0x9d10): undefined re

I added a HAVE_LIBX11 conditional variable in visit-config.h so I could turn off XDisplay coding in engine/main/VisItDisplay.C. I also made a couple more changes to Engine.C.

I edit the order of silo and hdf5 in the link.txt file so that libsiloh5 comes before libhdf5.

With the changes and manual edits to the link.txt file, I can produce a static engine_ser executable.

Still to figure out:

  • How to get the silo and hdf5 libraries in the right order
  • How does X11 make it in?
  • How does vtkzlib make it in?

After that, ice-t and parallel.

[edit] Apr 18, 2013

I located the mechanism by which X11 and Xext were being added to the engine_ser link line. The VTKTargets-debug.cmake file that is produced by VTK's build was including the X11 and Xext libraries. This part was incorporating X11 and Xext even though I built VTK without X support. I think that since this is a debug configuration, if I had been setting -DCMAKE_BUILD_TYPE:STRING=Release then I might not have run into this problem.

SET_TARGET_PROPERTIES(vtkRenderingOpenGL PROPERTIES
  IMPORTED_LINK_INTERFACE_LANGUAGES_DEBUG "CXX"
  IMPORTED_LINK_INTERFACE_LIBRARIES_DEBUG "vtkImagingHybrid;vtkRenderingCore;/g/g19/whitlocb/Development/seq/test_build_visit/thirdparty_static/mesa/7.8.2/linux-ppc64_gcc-4.4_BGQ/lib/libGLU.a;/g/g19/whitlocb/Development/seq/test_build_visit/thirdparty_static/mesa/7.8.2/linux-ppc64_gcc-4.4_BGQ/lib/libGL.a;/g/g19/whitlocb/Development/seq/test_build_visit/thirdparty_static/mesa/7.8.2/linux-ppc64_gcc-4.4_BGQ/lib/libGL.a"
  IMPORTED_LOCATION_DEBUG "${_IMPORT_PREFIX}/lib/libvtkRenderingOpenGL-6.0.a"
  )

The references to vtkzlib were all coming from VisIt's build, which assumed that VTK's zlib would be used. I've cleaned things up so that ${VTKZLIB_LIB} is used instead. This variable is set to vtkzlib normally. When we select our own zlib then we probably did so with VTK too so VTKZLIB_LIB is set to nothing in that case since ZLIB_LIB will be used instead. In a lot of these cases, the executables linked with vtkzlib and zlib -- probably because of function name mangling in vtkzlib.

[edit] Apr 19, 2013

I reordered some 3rd party libraries in databases/CMakeLists.txt and that fixed the Silo/HDF5 linker problem. By making Silo readers come before HDF5, the static link line puts HDF5 last, after Silo and any other library that depends on HDF5. With this change, the serial build completes.

I made some changes to the bv_icet.sh script:

  • Try mpicc -show if mpicc --showme:compile fails
  • Do not use mangled namespace for Mesa
  • Look through all -I directories returned from the -show command and use the one that contains mpi.h as the MPI directory

I made some changes to build_visit_BGQ so Ice-T builds separately so I can give it the GNU compilers but use the MPI/XL compilers for determining where to locate MPI. We might be able to fold this into the pass that builds Silo, etc.

I edited my existing rzuseq config-site file to include some changes for parallel and it's looking like the MPI detection code does not work. I may have to manually specify the MPI settings.

-- Setting up MPI using compiler wrapper
CMake Warning at /g/g19/whitlocb/Development/seq/test_build_visit/thirdparty_static/cmake/2.8.10.2/linux-ppc64_gcc-4.4/share/cmake-2.8/Modules/FindMPI.cmake:383 (message):
  Unable to find MPI library stdc++
Call Stack (most recent call first):
  /g/g19/whitlocb/Development/seq/test_build_visit/thirdparty_static/cmake/2.8.10.2/linux-ppc64_gcc-4.4/share/cmake-2.8/Modules/FindMPI.cmake:569 (interrogate_mpi_compiler)
  CMakeLists.txt:1316 (INCLUDE)
  CMakeLists.txt:1356 (DETECT_MPI_SETTINGS)


-- Could NOT find MPI_C (missing:  MPI_C_LIBRARIES) 
CMake Warning at /g/g19/whitlocb/Development/seq/test_build_visit/thirdparty_static/cmake/2.8.10.2/linux-ppc64_gcc-4.4/share/cmake-2.8/Modules/FindMPI.cmake:383 (message):
  Unable to find MPI library stdc++
Call Stack (most recent call first):
  /g/g19/whitlocb/Development/seq/test_build_visit/thirdparty_static/cmake/2.8.10.2/linux-ppc64_gcc-4.4/share/cmake-2.8/Modules/FindMPI.cmake:569 (interrogate_mpi_compiler)
  CMakeLists.txt:1316 (INCLUDE)
  CMakeLists.txt:1356 (DETECT_MPI_SETTINGS)


-- Could NOT find MPI_CXX (missing:  MPI_CXX_LIBRARIES) 
CMake Error at CMakeLists.txt:1320 (MESSAGE):
  Failed to setup MPI using compiler wrapper: mpicxx
Call Stack (most recent call first):
  CMakeLists.txt:1356 (DETECT_MPI_SETTINGS)

I added the following to my rzuseq config-site file and then VisIt's cmake build was able to get past configure/generate step. The build is proceeding. Now I have to figure out how to stick this into the build_visit_BGQ file, hopefully based off of the arguments from mpixlC -show.

##
## Add parallel arguments.
##
SET(BLUEGENEQ /bgsys/drivers/V1R2M0/ppc64)
VISIT_OPTION_DEFAULT(VISIT_PARALLEL ON TYPE BOOL)
VISIT_OPTION_DEFAULT(VISIT_MPI_CXX_FLAGS "-I${BLUEGENEQ} -I${BLUEGENEQ}/comm/sys/include -I${BLUEGENEQ}/spi/include -I${BLUEGENEQ}/spi/include/kernel/cnk -I${BLUEGENEQ}/comm/xl/include" TYPE STRING)
VISIT_OPTION_DEFAULT(VISIT_MPI_C_FLAGS   "-I${BLUEGENEQ} -I${BLUEGENEQ}/comm/sys/include -I${BLUEGENEQ}/spi/include -I${BLUEGENEQ}/spi/include/kernel/cnk -I${BLUEGENEQ}/comm/xl/include" TYPE STRING)
VISIT_OPTION_DEFAULT(VISIT_MPI_LD_FLAGS  "-L${BLUEGENEQ}/spi/lib -L${BLUEGENEQ}/comm/sys/lib -L${BLUEGENEQ}/spi/lib -L${BLUEGENEQ}/comm/sys/lib -L${BLUEGENEQ}/spi/lib -L${BLUEGENEQ}/comm/xl/lib" TYPE STRING)
VISIT_OPTION_DEFAULT(VISIT_MPI_LIBS     mpich opa mpl pami SPI SPI_cnk rt pthread stdc++ pthread TYPE STRING)

I'm thinking ahead to visit-build-open, which builds VisIt at LC and I'll have to rig it up so that it does 2 builds -- possibly from separate rzuseq config-site files.

[edit] Apr 22, 2013

Okay, I can run a binary now. The engine_par binary crashes pretty much immediately but that's okay as I'm not giving it any command line arguments.

rzuseqlac2{whitlocb}196: srun -N1 -n4 -A science ./engine_par -debug 5
2013-04-22 11:52:29.640 (WARN ) [0x40001fda1e0] 1177460:ibm.runjob.client.Job: terminated by signal 6
2013-04-22 11:52:29.640 (WARN ) [0x40001fda1e0] 1177460:ibm.runjob.client.Job: abnormal termination by signal 6 from rank 0

When I run like that, I get output from 4 tasks, which is what I'm expecting.


[edit] Job output

This crash on rank 0 probably happens as a result of the engine not being connected to any sockets, as we've not given any command line arguments telling it to connect. Rank 0:

Successfully loaded info about 90 database plugins.
Don't know how to broadcast empty vector!  Bailing out early.
Shared information about 90 database plugins.
Resetting idle timeout to 480 minutes.
Resetting execution timeout to 30 minutes.
signalhandler_core: SIGSEGV!

Rank 3:

ENGINE started. pid: 289
ParentProcess::Connect: Called with (numRead=1, numWrite=2, argc=1, argv={/g/g19/whitlocb/Development/seq/test_build_visit/builds_static_BGQ/visit-build/exe/./engine_par})
ParentProcess::Connect: done
Shared information about 22 plot plugins.
Shared information about 56 operator plugins.
Don't know how to broadcast empty vector!  Bailing out early.
Shared information about 90 database plugins.
Resetting idle timeout to 480 minutes.
Using MPI's Bcast; not VisIt_MPI_Bcast

Once observation that I had is that it looks like the plugin broadcasting step is happening even though we're statically linked. That's probably a slowdown since it means collective communication. I can skip any coordination among ranks with respect to plugins if the engine is built statically.

[edit] May 23, 2013

I spent the day making various changes to build_visit_BGQ and the build_visit scripts. I put in a mechanism in build_visit that keys off the BUILD_VISIT_BGQ environment variable to follow special logic for BG/Q. I use the mechanism to add extra settings to the generated config-site file (turn off X11, GLEW, set special BG/Q parallel compiler flags). I also use the mechanism to filter out X11 and Xext from the engine link line.

I was having various troubles building all day (quota issues). I find also that sometimes cmake runs fine from the command line but when I run it from build_visit_BGQ, there are problems where VisIt won't configure. I can't explain it.

[edit] May 30, 2013

I've gotten to the point where build_visit_BGQ can run to completion, provided I do a serial make. If I pass --makeflags -j4 then random packages start failing to build.

I made config-site files for rzuseqlac2's compute and login nodes and then I saw that I ended up with NETCDF support in the login node but not in the compute node. This was because I had taken to using individual I/O library flags on the build_visit invocations inside build_visit_BGQ. I changed the code for the compute node build so it uses $IO_PACKAGES and then I started getting build failures in NETCDF. I found that NETCDF failed to build because I was not specifying the location to zlib. I fixed up bv_netcdf so it uses the zlib that you've built if you pass --zlib. This got NETCDF building for the compute node. I ultimately had to abandon NETCDF for the statically linked engine on this platform though because I was not able to get the library ordering right in src/databases/CMakeLists.txt.

I edited src/svn_bin/visit-build-open so that it adds a 2 step build process for rzuseqlac2. First, we build the compute node version and install it into a directory. This is an engine-only build. Then we do a server-components-only build for the login node and install it over the engine-only build, adding the serial engine, mdserver, and vcl.

The login node build is screwing up. It's selecting the bgxlc compiler no matter what I do.

[  0%] Building C object third_party_builtin/glew/CMakeFiles/visitGLEW.dir/glew/src/glew.c.o
cd /g/g19/whitlocb/Development/seq/trunk/login_build/third_party_builtin/glew && /usr/local/bin/bgxlc  -DGLEW_STATIC -DHAVE_LIBGLEW -DHAVE_LIBSLIVR -DVISIT_STATIC -D_LARGEFILE64_SOURCE -qthreaded -qalias=noansi -qhalt=e  -O -DNDEBUG -I/g/g19/whitlocb/Development/seq/trunk/login_build/include -I/g/g19/whitlocb/Development/seq/trunk/src/include -I/g/g19/whitlocb/Development/seq/trunk/src/third_party_builtin/glew/glew/include    -o CMakeFiles/visitGLEW.dir/glew/src/glew.c.o   -c /g/g19/whitlocb/Development/seq/trunk/src/third_party_builtin/glew/glew/src/glew.c
"/g/g19/whitlocb/Development/seq/trunk/src/third_party_builtin/glew/glew/include/GL/glxew.h", line 94.10: 1506-296 (S) #include file <X11/Xlib.h> not found.
"/g/g19/whitlocb/Development/seq/trunk/src/third_party_builtin/glew/glew/include/GL/glxew.h", line 95.10: 1506-296 (S) #include file <X11/Xutil.h> not found.
"/g/g19/whitlocb/Development/seq/trunk/src/third_party_builtin/glew/glew/include/GL/glxew.h", line 96.10: 1506-296 (S) #include file <X11/Xmd.h> not found.
"/g/g19/whitlocb/Development/seq/trunk/src/third_party_builtin/glew/glew/include/GL/glxew.h", line 144.9: 1506-166 (S) Definition of function XID requires parentheses.
"/g/g19/whitlocb/Development/seq/trunk/src/third_party_builtin/glew/glew/include/GL/glxew.h", line 144.13: 1506-276 (S) Syntax error: possible missing '{'?
"/g/g19/whitlocb/Development/seq/trunk/src/third_party_builtin/glew/glew/include/GL/glxew.h", line 293.8: 1506-275 (S) Unexpected text send_event encountered.
"/g/g19/whitlocb/Development/seq/trunk/src/third_party_builtin/glew/glew/include/GL/glxew.h", line 293.3: 1506-045 (S) Undeclared identifier Bool.
"/g/g19/whitlocb/Development/seq/trunk/src/third_party_builtin/glew/glew/include/GL/glxew.h", line 294.3: 1506-045 (S) Undeclared identifier Display.
"/g/g19/whitlocb/Development/seq/trunk/src/third_party_builtin/glew/glew/include/GL/glxew.h", line 294.12: 1506-045 (S) Undeclared identifier display.
"/g/g19/whitlocb/Development/seq/trunk/src/third_party_builtin/glew/glew/include/GL/glxew.h", line 295.15: 1506-275 (S) Unexpected text drawable encountered.
"/g/g19/whitlocb/Development/seq/trunk/src/third_party_builtin/glew/glew/include/GL/glxew.h", line 295.3: 1506-045 (S) Undeclared identifier GLXDrawable.
"/g/g19/whitlocb/Development/seq/trunk/src/third_party_builtin/glew/glew/include/GL/glxew.h", line 301.3: 1506-273 (E) Missing type in declaration of GLXPbufferClobberEvent.
"/g/g19/whitlocb/Development/seq/trunk/src/third_party_builtin/glew/glew/include/GL/glxew.h", line 303.3: 1506-046 (S) Syntax error.
"/g/g19/whitlocb/Development/seq/trunk/src/third_party_builtin/glew/glew/include/GL/glxew.h", line 307.9: 1506-166 (S) Definition of function GLXFBConfig requires parentheses.
"/g/g19/whitlocb/Development/seq/trunk/src/third_party_builtin/glew/glew/include/GL/glxew.h", line 307.20: 1506-276 (S) Syntax error: possible missing '{'?
"/g/g19/whitlocb/Development/seq/trunk/src/third_party_builtin/glew/glew/include/GL/glxew.h", line 980.3: 1506-273 (E) Missing type in declaration of GLXHyperpipeNetworkSGIX.
"/g/g19/whitlocb/Development/seq/trunk/src/third_party_builtin/glew/glew/include/GL/glxew.h", line 1006.55: 1506-277 (S) Syntax error: possible missing ')' or ','?
"/g/g19/whitlocb/Development/seq/trunk/src/third_party_builtin/glew/glew/include/GL/glxew.h", line 1006.47: 1506-279 (S) A function declarator cannot have a parameter identifier list if it is not a function definition.
"/g/g19/whitlocb/Development/seq/trunk/src/third_party_builtin/glew/glew/include/GL/glxew.h", line 1007.64: 1506-277 (S) Syntax error: possible missing ')' or ','?
"/g/g19/whitlocb/Development/seq/trunk/src/third_party_builtin/glew/glew/include/GL/glxew.h", line 1007.56: 1506-279 (S) A function declarator cannot have a parameter identifier list if it is not a function definition.
"/g/g19/whitlocb/Development/seq/trunk/src/third_party_builtin/glew/glew/include/GL/glxew.h", line 1008.57: 1506-277 (S) Syntax error: possible missing ')' or ','?
"/g/g19/whitlocb/Development/seq/trunk/src/third_party_builtin/glew/glew/include/GL/glxew.h", line 1008.49: 1506-279 (S) A function declarator cannot have a parameter identifier list if it is not a function definition.
"/g/g19/whitlocb/Development/seq/trunk/src/third_party_builtin/glew/glew/include/GL/glxew.h", line 1009.57: 1506-277 (S) Syntax error: possible missing ')' or ','?
"/g/g19/whitlocb/Development/seq/trunk/src/third_party_builtin/glew/glew/include/GL/glxew.h", line 1009.49: 1506-279 (S) A function declarator cannot have a parameter identifier list if it is not a function definition.
"/g/g19/whitlocb/Development/seq/trunk/src/third_party_builtin/glew/glew/include/GL/glxew.h", line 1010.62: 1506-277 (S) Syntax error: possible missing ')' or ','?
make[2]: *** [third_party_builtin/glew/CMakeFiles/visitGLEW.dir/glew/src/glew.c.o] Error 1

If I add -DCMAKE_C_COMPILER:FILEPATH=/usr/bin/gcc on the command line then cmake goes into an infinite loop of configures. WTF?

[edit] June 4, 2013

I kept running into the weird issue above where the compiler would reset to bgxlC for the login node builds. Eventually, I determined that the bug seemed limited to out of source builds (see http://visitbugs.ornl.gov/issues/1484). I eventually ran 2 in-source builds and was able to get things compiling. I made changes to visit-build-open to make it do 2 in-source builds to work around the problem.

I'm testing the visit-build-open script from my Mac:

cd trunk
src/svn_bin/visit-dist visit-test-build-bgq
visit-build-open -none +rzuseq visit-test-build-bgq

The above visit-build-open command failed because the script can't directly ssh to rzuseq since I'm not in the RZ where I ran the script. Fine. I copied the visit-test-build-bgq.tar.gz distribution and the created rzuseq file to my rzuseq account. I removed the commands from the front of the script that copy the files into /usr/tmp/$user and just started at the gunzip command.

The compute node build failed because I had not included the right path to filter_x11.sh. More importantly, my tarball did not contain the rzuseqlac2-compute.cmake config-site file. The login node build seems to be proceeding though.

In the meantime, I made a rzuseq host profile and I was able to connect to the previous co-install that I built. My Mac was able to connect to rzuseq, launch vcl and mdserver, and it was able to queue up a job using srun. Now, that job did not run because the machine was not running jobs but this looks promising.

I tried running client/server to rzuseq and using the serial compute engine. The engine runs and connects back. When I try to plot something, it dies. Here is the end of the engine log files:

Creating new VisWindow for id=0
Forcing GL context initialization...
ERROR: In /g/g19/whitlocb/Development/seq/trunk/test_build_visit/builds_static/VTK-5e3c539/Rendering/OpenGL/vtkXOpenGLRenderWindow.cxx, line 383
vtkXOpenGLRenderWindow (0x13ae3590): Could not find a decent visual

signalhandler_core: SIGSEGV!

Crap! This is on the login node, which has X11 (but not a good GL). I hope we don't have to install some Mesa render window for the serial engine too. Back on BlueGene/P, I had to override VTK's vtkXOpenGLRenderWindow class with a version that did OSMesa. That code can be found here.

[edit] June 5, 2013

My visit-build-open test was successful. Both builds installed to a common directory and a tarball was produced.

[edit] runtime error

I decided to see what the parallel engine will do since jobs are running today and it is linked with Mesa as GL. Here's what I get when the engine starts on 1 node:

Running: srun -n 16 -N 1 /g/g19/whitlocb/Development/seq/test/visit-test-bgq-build/visit2_7_0b.linux-ppc64/2.7.0b/linux-ppc64/bin/engine_par -dir /g/g19/whitlocb/Development/seq/test/visit-test-bgq-build/visit2_7_0b.linux-ppc64 -forcestatic -idle-timeout 480 -debug 5 -clobber_vlogs -dv -noloopback -sshtunneling -host rzuseqlac2 -port 17634
/g/g19/whitlocb/Development/seq/test/visit-test-bgq-build/visit2_7_0b.linux-ppc64/2.7.0b/linux-ppc64/bin/engine_par: error while loading shared libraries: libibmc++.so.1: cannot open shared object file: No such file or directory
/g/g19/whitlocb/Development/seq/test/visit-test-bgq-build/visit2_7_0b.linux-ppc64/2.7.0b/linux-ppc64/bin/engine_par: error while loading shared libraries: libibmc++.so.1: cannot open shared object file: No such file or directory
/g/g19/whitlocb/Development/seq/test/visit-test-bgq-build/visit2_7_0b.linux-ppc64/2.7.0b/linux-ppc64/bin/engine_par: error while loading shared libraries: libibmc++.so.1: cannot open shared object file: No such file or directory
/g/g19/whitlocb/Development/seq/test/visit-test-bgq-build/visit2_7_0b.linux-ppc64/2.7.0b/linux-ppc64/bin/engine_par: error while loading shared libraries: libibmc++.so.1: cannot open shared object file: No such file or directory
/g/g19/whitlocb/Development/seq/test/visit-test-bgq-build/visit2_7_0b.linux-ppc64/2.7.0b/linux-ppc64/bin/engine_par: error while loading shared libraries: libibmc++.so.1: cannot open shared object file: No such file or directory
/g/g19/whitlocb/Development/seq/test/visit-test-bgq-build/visit2_7_0b.linux-ppc64/2.7.0b/linux-ppc64/bin/engine_par: error while loading shared libraries: libibmc++.so.1: cannot open shared object file: No such file or directory
/g/g19/whitlocb/Development/seq/test/visit-test-bgq-build/visit2_7_0b.linux-ppc64/2.7.0b/linux-ppc64/bin/engine_par: error while loading shared libraries: libibmc++.so.1: cannot open shared object file: No such file or directory
/g/g19/whitlocb/Development/seq/test/visit-test-bgq-build/visit2_7_0b.linux-ppc64/2.7.0b/linux-ppc64/bin/engine_par: error while loading shared libraries: libibmc++.so.1: cannot open shared object file: No such file or directory
/g/g19/whitlocb/Development/seq/test/visit-test-bgq-build/visit2_7_0b.linux-ppc64/2.7.0b/linux-ppc64/bin/engine_par: error while loading shared libraries: libibmc++.so.1: cannot open shared object file: No such file or directory
/g/g19/whitlocb/Development/seq/test/visit-test-bgq-build/visit2_7_0b.linux-ppc64/2.7.0b/linux-ppc64/bin/engine_par: error while loading shared libraries: libibmc++.so.1: cannot open shared object file: No such file or directory
/g/g19/whitlocb/Development/seq/test/visit-test-bgq-build/visit2_7_0b.linux-ppc64/2.7.0b/linux-ppc64/bin/engine_par: error while loading shared libraries: libibmc++.so.1: cannot open shared object file: No such file or directory
/g/g19/whitlocb/Development/seq/test/visit-test-bgq-build/visit2_7_0b.linux-ppc64/2.7.0b/linux-ppc64/bin/engine_par: error while loading shared libraries: libibmc++.so.1: cannot open shared object file: No such file or directory
/g/g19/whitlocb/Development/seq/test/visit-test-bgq-build/visit2_7_0b.linux-ppc64/2.7.0b/linux-ppc64/bin/engine_par: error while loading shared libraries: libibmc++.so.1: cannot open shared object file: No such file or directory
/g/g19/whitlocb/Development/seq/test/visit-test-bgq-build/visit2_7_0b.linux-ppc64/2.7.0b/linux-ppc64/bin/engine_par: error while loading shared libraries: libibmc++.so.1: cannot open shared object file: No such file or directory
/g/g19/whitlocb/Development/seq/test/visit-test-bgq-build/visit2_7_0b.linux-ppc64/2.7.0b/linux-ppc64/bin/engine_par: error while loading shared libraries: libibmc++.so.1: cannot open shared object file: No such file or directory
/g/g19/whitlocb/Development/seq/test/visit-test-bgq-build/visit2_7_0b.linux-ppc64/2.7.0b/linux-ppc64/bin/engine_par: error while loading shared libraries: libibmc++.so.1: cannot open shared object file: No such file or directory
2013-06-05 10:14:58.055 (WARN ) [0x40001fda1e0] 1368280:ibm.runjob.client.Job: normal termination with status 127 from rank 0

I edited the link.txt file for engine_par to add -R/opt/ibmcmp/lib64/bg, which lets you add an rpath and then on the next run, the ibm c++ library was located. I'll add that to my rzuseq-compute.cmake file.

[edit] networking error

When I tried running again, I got this:

Running: srun -n 16 -N 1 /g/g19/whitlocb/Development/seq/test/visit-test-bgq-build/visit2_7_0b.linux-ppc64/2.7.0b/linux-ppc64/bin/engine_par -dir /g/g19/whitlocb/Development/seq/test/visit-test-bgq-build/visit2_7_0b.linux-ppc64 -forcestatic -idle-timeout 480 -debug 5 -clobber_vlogs -dv -noloopback -sshtunneling -host rzuseqlac2 -port 25557
2013-06-05 10:37:43.050 (WARN ) [0x40001fda1e0] 1368290:ibm.runjob.client.Job: terminated by signal 6
2013-06-05 10:37:43.050 (WARN ) [0x40001fda1e0] 1368290:ibm.runjob.client.Job: abnormal termination by signal 6 from rank 0

At least I was able to get debug logs for each of the processors. The rank 0 log indicates that it was not able to connect back to vcl.

rzuseqlac2{whitlocb}297: cat ~/A.engine_par.000.5.vlog
/g/g19/whitlocb/Development/seq/test/visit-test-bgq-build/visit2_7_0b.linux-ppc64/2.7.0b/linux-ppc64/bin/engine_par -dir /g/g19/whitlocb/Development/seq/test/visit-test-bgq-build/visit2_7_0b.linux-ppc64 -forcestatic -idle-timeout 480 -clobber_vlogs -dv -noloopback -sshtunneling -host rzuseqlac2 -port 25557 -key 4fc488004aae3f964ca7 
ENGINE started. pid: 1
ParentProcess::Connect: Called with (numRead=1, numWrite=2, argc=16, argv={/g/g19/whitlocb/Development/seq/test/visit-test-bgq-build/visit2_7_0b.linux-ppc64/2.7.0b/linux-ppc64/bin/engine_par, -dir, /g/g19/whitlocb/Development/seq/test/visit-test-bgq-build/visit2_7_0b.linux-ppc64, -forcestatic, -idle-timeout, 480, -clobber_vlogs, -dv, -noloopback, -sshtunneling, -host, rzuseqlac2, -port, 25557, -key, 4fc488004aae3f964ca7})
ParentProcess::Connect: hostName = rzuseqlac2
ParentProcess::GetHostInfo: Calling gethostbyname("rzuseqlac2")
ParentProcess::Connect: port = 25557
ParentProcess::Connect: securityKey = 4fc488004aae3f964ca7
ParentProcess::Connect: Creating sockets
ParentProcess::Connect: Creating read sockets
ParentProcess::GetClientSocketDescriptor: Set up using port 25557
ParentProcess::GetClientSocketDescriptor: Creating a socket
ParentProcess::GetClientSocketDescriptor: Setting socket options
ParentProcess::GetClientSocketDescriptor: Calling connect
(If you see no messages after this one, VisIt was not
able to connect to the client machine.  Nine times out
of ten, this is a firewall issue on the client machine.
It could also mean that VisIt was unable to resolve the
IP address for the client machine.  You may need to verify the contents of /etc/hosts.)
ParentProcess::GetClientSocketDescriptor: Could not connect! (error=101: Network is unreachable)
ParentProcess::Connect: Creating write sockets
ParentProcess::GetClientSocketDescriptor: Set up using port 25557
ParentProcess::GetClientSocketDescriptor: Creating a socket
ParentProcess::GetClientSocketDescriptor: Setting socket options
ParentProcess::GetClientSocketDescriptor: Calling connect
(If you see no messages after this one, VisIt was not
able to connect to the client machine.  Nine times out
of ten, this is a firewall issue on the client machine.
It could also mean that VisIt was unable to resolve the
IP address for the client machine.  You may need to verify the contents of /etc/hosts.)
ParentProcess::GetClientSocketDescriptor: Could not connect! (error=101: Network is unreachable)
ParentProcess::GetClientSocketDescriptor: Set up using port 25557
ParentProcess::GetClientSocketDescriptor: Creating a socket
ParentProcess::GetClientSocketDescriptor: Setting socket options
ParentProcess::GetClientSocketDescriptor: Calling connect
(If you see no messages after this one, VisIt was not
able to connect to the client machine.  Nine times out
of ten, this is a firewall issue on the client machine.
It could also mean that VisIt was unable to resolve the
IP address for the client machine.  You may need to verify the contents of /etc/hosts.)
ParentProcess::GetClientSocketDescriptor: Could not connect! (error=101: Network is unreachable)
Exception: (CouldNotConnectException) /g/g19/whitlocb/Development/seq/test/visit-test-bgq-build/compute_build/common/comm/ParentProcess.C, line 411: <The reason for the exception was not described>
signalhandler_core: SIGBRT!

I added some code to customlauncher for LLNL that uses /sbin/ifconfig to determine the IP address for eth0 on the login node and then the compute node connects back to that. That change was enough to get the engine to connect back to VCL and then to the viewer through VCL's ssh tunnel. Yay! I get a crash but the networking works -- and better than it did for dawn. It takes a moment to connect fully once the connection progress dialog goes away.

[edit] Crash

When I plot 'd' from /usr/gapps/visit/data/multi_ucd3d.silo, the parallel engine crashes. Here is the output from the rank 0 debug log:

Executing OpenDatabaseRPC: db=/usr/gapps/visit/data/multi_ucd3d.silo, time=0
Not opening /libESiloDatabase_par.a because this is a static build.
Asked for Silo_GetEngineInfo
Loaded full database plugin Silo version 1.0
Loading new database
avtDatabaseFactory: specifically told to use Silo_1.0
Skipping already loaded database plugin Silo version 1.0
Trying to open the file with the Silo file format, strict mode is off
Opening silo file /usr/gapps/visit/data/multi_ucd3d.silo
Succeeding in opening Silo file with DB_PDB driver
The following Silo error occurred: db_pdb_getvarinfo: Low-level function call failed: PJ_inquire_entry
The following Silo error occurred: db_pdb_getvarinfo: Low-level function call failed: PJ_inquire_entry
Guessing this Silo file was produced by code "Unknown"
Guessing material "mat1" is defined on mesh "mesh1"
Guessing variable "d" is defined on mesh "mesh1"
Guessing variable "d_dup" is defined on mesh "mesh1_dup"
Guessing variable "d_on_mats_1_3" is defined on mesh "mesh1"
For mat-restricted variable "d_on_mats_1_3"...
    Looking for materials matching region_pname[0] = "1"...
        Comparing using regno=1...
            for "1" got matno=1 matched
    Looking for materials matching region_pname[1] = "3"...
        Comparing using regno=3...
            for "1" got matno=1 DID NOT match
            for "2" got matno=2 DID NOT match
            for "3" got matno=3 matched
Variable "d_on_mats_1_3" is restricted to the following regions...
"1" (mat-index = 0)
"3" (mat-index = 2)
Using fuzzy logic to match multivar "d_split" to a multimesh
Exception: (SiloException) /g/g19/whitlocb/Development/seq/test/visit-test-bgq-build/compute_build/databases/Silo/avtSiloFileFormat.C, line 12995: A Silo error occurred.
The error is "Was not able to match multivar "d_split" and its first 
non-empty submesh "/block1/d_back" in file /block1/mesh1_back to a multi-mesh.
This typically leads to the variable being invalidated
(grayed out) in the GUI."
signalhandler_core: SIGBRT!

Other ranks make it to:

Opening silo file /usr/gapps/visit/data/multi_ucd3d.silo
Succeeding in opening Silo file with DB_PDB driver
The following Silo error occurred: db_pdb_getvarinfo: Low-level function call failed: PJ_inquire_entry
The following Silo error occurred: db_pdb_getvarinfo: Low-level function call failed: PJ_inquire_entry

[edit] Rendering Crash

I decided to try opening a simple file and plotting it instead of Silo where things might go wrong. I opened the curve example files in /usr/gapps/visit/data/CURVES. When I plot the curve, I get:

Running: srun -n 16 -N 1 /g/g19/whitlocb/Development/seq/test/visit-test-bgq-build/visit2_7_0b.linux-ppc64/2.7.0b/linux-ppc64/bin/engine_par -dir /g/g19/whitlocb/Development/seq/test/visit-test-bgq-build/visit2_7_0b.linux-ppc64 -forcestatic -idle-timeout 480 -debug 5 -clobber_vlogs -dv -noloopback -host 134.9.11.131 -port 28148
engine_par: main/renderbuffer.c:1924: _mesa_add_renderbuffer: Assertion `bufferName == BUFFER_DEPTH || bufferName == BUFFER_STENCIL || fb->Attachment[bufferName].Renderbuffer == ((void *)0)' failed.
2013-06-05 16:55:57.612 (WARN ) [0x40001fda1e0] 1369251:ibm.runjob.client.Job: terminated by signal 6
2013-06-05 16:55:57.612 (WARN ) [0x40001fda1e0] 1369251:ibm.runjob.client.Job: abnormal termination by signal 6 from rank 0

Here is the end of the engine log:

Resetting execution timeout to 30 minutes.
Executing SetWinAnnotAttsRPC 723x704
Creating new VisWindow for id=0
Forcing GL context initialization...
signalhandler_core: SIGBRT!

If we were not using a render window in the engine until it was actually needed, I'd be up and running already.

[edit] June 6, 2013

I went into VisWinRendering.C and I commented out the code in Realize() and RenderRenerWindow() so no rendering would occur since I'm not doing rendering on the engine yet. That let me avoid VTK window problems. I put printf's in there too and I only observed the Realize() method being called. This happens in the NetworkManager::NewViswin() method.

  • I think we should not call Realize() until we're actually doing an SR render. That seems safe and will avoid VTK window problems until SR is needed.

With the rendering change, I was able to open a simple curve file and make a plot. I ran with timings and I found that the plugin broadcaster is still active for static builds. That's not needed and it appears to add like a minute onto the startup time. I was not able to get the right magic working yet though to skip the plugin broadcaster.

[edit] June 7, 2013

I was looking into how software rendering with Mesa is handled now in a post VTK-6.0 world. There is a visit_vtk/osmesa library that installs overrides of the render window. The render window override is an OSMesa-based version similar to what I did in the BG/P port. If VisIt was built with Mesa support (HAVE_OSMESA gets defined) then the factory is compiled in and the engine forces the use of the OSMesa factory. In the shared library case, there is some LD_LIBRARY_PATH monkey-business that forces the libOSMesa.so library to be loaded, probably to satisfy GL symbols.

I'm not building BG/Q with Mesa in the same way right now. I'm building Mesa but I'm calling it libGL.a instead of libOSMesa.a. In order to use the mechanism in VisIt now that turns on the Mesa rendering, it looks like I'll have to change the following things:

  1. Change my Mesa build in build_visit_BGQ so it gets called libOSMesa.a
  2. Tell VTK's build in build_visit_BGQ to look for GL in libOSMesa.a
  3. Turn on --mesa on the final build_visit that makes the config-site file so VISIT_MESA_DIR will be set. The FindVisItMesa logic should succeed since Mesa will be called OSMesa.
  4. Tell VisIt to find GL in libOSMesa.a
  5. Ensure that I'm only linking libOSMesa.a as my GL library.

I think that with those changes, I may be able to get OSMesa rendering to work.

[edit] June 10, 2013

I made some changes to build_visit_BGQ so Mesa will be called libOSMesa.a when it is built. I then changed the OpenGL and Mesa paths for VTK's build so they look for libOSMesa.a for their GL library. I added --mesa to the build_visit invocation that should generate the config-site file. I still need to change VisIt's CMakeLists.txt so that it will accept libOSMesa.a as its GL library instead of defaulting to "GL".

Okay, so I made the changes and I run into a linking problem with vtkOSMesaCreateWindow being defined once in VTK and once in our sources. Then there seem to be a bunch of unresolved C++ library symbols.

Linking CXX executable ../../exe/engine_par
/usr/gapps/visit/thirdparty_static/2.7.0/vtk/5e3c539/linux-ppc64_gcc-4.4_BGQ/lib/libvtkRenderingOpenGL-6.0.a(vtkOSOpenGLRenderWindow.cxx.o): In function `vtkOSMesaDestroyWindow(void*)':
/g/g19/whitlocb/Development/seq/trunk/test_build_visit/builds_static_BGQ/VTK-5e3c539/Rendering/OpenGL/vtkOSOpenGLRenderWindow.cxx:(.opd+0x138): multiple definition of `vtkOSMesaDestroyWindow(void*)'
../../lib/libvisit_vtk_osmesa.a(vtkOSMesaGLRenderWindow.C.o):vtkOSMesaGLRenderWindow.C:(.opd+0x9a0): first defined here
/usr/gapps/visit/thirdparty_static/2.7.0/vtk/5e3c539/linux-ppc64_gcc-4.4_BGQ/lib/libvtkRenderingOpenGL-6.0.a(vtkOSOpenGLRenderWindow.cxx.o): In function `vtkOSMesaCreateWindow(int, int)':
/g/g19/whitlocb/Development/seq/trunk/test_build_visit/builds_static_BGQ/VTK-5e3c539/Rendering/OpenGL/vtkOSOpenGLRenderWindow.cxx:(.opd+0x150): multiple definition of `vtkOSMesaCreateWindow(int, int)'
../../lib/libvisit_vtk_osmesa.a(vtkOSMesaGLRenderWindow.C.o):vtkOSMesaGLRenderWindow.C:(.opd+0x960): first defined here
/usr/gapps/visit/thirdparty_static/2.7.0/vtk/5e3c539/linux-ppc64_gcc-4.4_BGQ/lib/libvtkIOPLY-6.0.a(vtkPLYReader.cxx.o):(.eh_frame+0x12): undefined reference to `__IBMCPlusPlusExceptionV3'
/usr/gapps/visit/thirdparty_static/2.7.0/vtk/5e3c539/linux-ppc64_gcc-4.4_BGQ/lib/libvtkIOPLY-6.0.a(vtkPLYWriter.cxx.o):(.eh_frame+0x12): undefined reference to `__IBMCPlusPlusExceptionV3'
/usr/gapps/visit/thirdparty_static/2.7.0/vtk/5e3c539/linux-ppc64_gcc-4.4_BGQ/lib/libvtkIOPLY-6.0.a(vtkPLY.cxx.o):(.text+0x17b0): undefined reference to `__uitrunc'
/usr/gapps/visit/thirdparty_static/2.7.0/vtk/5e3c539/linux-ppc64_gcc-4.4_BGQ/lib/libvtkIOPLY-6.0.a(vtkPLY.cxx.o):(.text+0x1800): undefined reference to `__uitrunc'
/usr/gapps/visit/thirdparty_static/2.7.0/vtk/5e3c539/linux-ppc64_gcc-4.4_BGQ/lib/libvtkIOPLY-6.0.a(vtkPLY.cxx.o):(.text+0x3ee8): undefined reference to `__uitrunc'
/usr/gapps/visit/thirdparty_static/2.7.0/vtk/5e3c539/linux-ppc64_gcc-4.4_BGQ/lib/libvtkIOPLY-6.0.a(vtkPLY.cxx.o):(.text+0x4824): undefined reference to `__uitrunc'
/usr/gapps/visit/thirdparty_static/2.7.0/vtk/5e3c539/linux-ppc64_gcc-4.4_BGQ/lib/libvtkIOPLY-6.0.a(vtkPLY.cxx.o):(.text+0x4970): undefined reference to `__uitrunc'

I edited our vtkOSMesaGLRenderWindow.C file and I made 2 functions be static since they are otherwise already defined in VTK. Making them static should hide them.

static void vtkOSMesaDestroyWindow(void *Window)
{
  free(Window);
}
 
static void *vtkOSMesaCreateWindow(int width, int height)
{
  return malloc(width*height*4);
}

After that, I tacked on /opt/ibmcmp/lib64/bg/libibmc++.so.1 to the end of my link.txt file for the parallel engine. That got rid of the __IBMCPlusPlusExceptionV3 missing symbols.

I don't know why it's complaining now about VTK libraries missing __uitrunc now. I rebuilt VTK today but nothing much is different and I don't know why I would not have run into this problem before. __uitrunc sounds like a math function.

[edit] June 11, 2013

Okay, I found that /usr/bin/g++ is being used to compile stuff. BAD! bgxlC is supposed to be used. No wonder I was getting linker errors. Apparently, I had left a CMakeFiles directory around from a previous build and that had some stuff that was interfering. I'm restarting in a new src directory.

Okay, no more linking problems.

There are still plenty of ways to screw up.

  • I forgot to put the src/resources/hosts/llnl/customlauncher into my installation's 2.7.0b/bin directory. Without the customlauncher, the engine cannot connect back to vcl on the login node.

I put some VTK files in /usr/gapps/visit/data/VTK on RZ. When I open the .visit file and plot it, the engine still bails with:

engine_par: main/renderbuffer.c:1924: _mesa_add_renderbuffer: Assertion `bufferName == BUFFER_DEPTH || bufferName == BUFFER_STENCIL || fb->Attachment[bufferName].Renderbuffer == ((void *)0)' failed.
engine_par: main/renderbuffer.c:1924: _mesa_add_renderbuffer: Assertion `bufferName == BUFFER_DEPTH || bufferName == BUFFER_STENCIL || fb->Attachment[bufferName].Renderbuffer == ((void *)0)' failed.
engine_par: main/renderbuffer.c:1924: _mesa_add_renderbuffer: Assertion `bufferName == BUFFER_DEPTH || bufferName == BUFFER_STENCIL || fb->Attachment[bufferName].Renderbuffer == ((void *)0)' failed.

Shortly thereafter, the following occured...

2013-06-11 09:38:56.801 (WARN ) [0x40001fda1e0] 1381704:ibm.runjob.client.Job: terminated by signal 6
2013-06-11 09:38:56.802 (WARN ) [0x40001fda1e0] 1381704:ibm.runjob.client.Job: abnormal termination by signal 6 from rank 0

I guess I need to trace to see whether Mesa buffers, etc are getting created.

Personal tools