Profiling

Normally, you should use the timing infrastructure to figure out how long a chunk of code takes to run. However, sometimes that simply isn't good enough. Profiling is particularly useful in the case that you don't already know which chunk of code is causing poor execution times.

Currently, we only deal with profiling VisIt on Linux.

oprofile

If you have sudo access on your development system, oprofile is probably your best bet. It can profile shared libraries or executables, create annotated source, and utilize specific system-level counters to track hardware events. Ubuntu has a package for it, and your favorite Linux distribution might as well. Otherwise, you can build it from source. Find out more at the oprofile sourceforge page.

gcov

gcov is a great tool for doing code coverage analysis. It can give a line-by-line breakdown of the number of times a code path reached that point, given a program run. It is also useful in determining branching factors -- how often an `if' statement is true, for example.

Unfortunately there is nothing in gcov to time portions of code. Therefore gcov is not very useful for identifying slower chunks of code.

gprof

gprof is the `standard' tool for profiling applications on Linux. Unfortunately, gprof simply does not support shared libraries. This is a showstopper for VisIt, as we use shared libraries internally for almost everything; most of the VisIt components are compiled from very small source files (containing just main) which call into a shared library almost immediately.

sprof

sprof comes with glibc. It was specifically written to profile shared libraries. If you can't use oprofile, sprof is probably your next best option for profiling code in a shared library.

Unfortunately trying to use sprof with VisIt can sometimes result in error messages such as:

/path/to/engine_par: no PLTREL found in object /usr/lib64/tls/libnvidia-tls.so.1

here, the nvidia driver was linked in such a way that disables the mechanism by which sprof acquires profiling data. This effectively means that any program which links to the nvidia driver (or a library which in turn links to the nvidia driver ... recursively) cannot utilize sprof.

libtprof

libtprof is a library for profiling applications, shared library or not. You can find more details (and source) about libtprof at its website.

Using libtprof with VisIt

You will want an existing build of VisIt.

Using libtprof is fairly simple. It relies on gcc's ability to instrument source files. Note that no source modifications are necessary.

  1. Install libtprof.
  2. Hack one of VisIt's makefiles to change how the source files you care about are built.
  3. Recompile that portion of VisIt.
  4. Run VisIt as normal on the test case you are interested in.
  5. Examine the (copious) amounts of extra output.

Hacking a Makefile

You will need to add compiler options to the source files you are interested in, which instruct gcc / libtprof that you'd like to profile any functions in those files. This step isn't strictly necessary: technically, one could recompile VisIt with CXXFLAGS set to include -finstrument-functions, but this would cause a profiling of all of VisIt. The runtime cost of profiling for this approach is prohibitive.

So the solution is to only profile what you care about. You can mix and match profiled and non-profiled objects at will.

To do this, open up the Makefile you are interested in and append -finstrument-functions to your CXXFLAGS. This instructs the build system to instrument those files for profiling. Secondly, you will need to make sure -ltprof is added to lines which link; for the most part, appending it to LDFLAGS should do the trick. Unfortunately the build system does not seem to be completely consistent in using LDFLAGS whenever linking an object; try SHLIB_FORCED if LDFLAGS doesn't work.

Recompile That Portion of VisIt

Just type make!

Run VisIt As Normal

Execute src/bin/visit as you normally would, on the test cases you care about. It is recommended that you devise a command line only test (i.e. something that can be run in batch mode via -nowin -cli -s <script.py>), as VisIt may run aggravatingly slow.

Furthermore, even if you have only instrumented a small portion of VisIt, there will be a lot of output. After you're sure things are setup correctly, you probably want to redirect to a file.

Examine the Output

libtprof outputs a list of functions and their call sites, along with the number of 'units' spent in that function. Here 'units' is a machine-and-processor-specific quantity. The units from one run are meaningful to examine, but the units between multiple runs are unlikely to have relative meaning. This is especially true between different machines.

Here is an example line in the output:

365889 units: ll_print from ll_print (/path/to/binary)

As you can see, the format is normally:

<'time'> units: <function> from <call site> (<object>)

Note that <function> and <call site> will be mangled, since VisIt is a C++ application. You can filter these results through c++filt to get more meaningful names. The author's workflow is normally:

$ /path/to/instrumented/visit -nowin -cli -s mytest.py &> prof_log
$ cat prof_log | c++filt | grep -v std:: | sort -n | less

Unlike early versions, the time units are always given as the first 'column' printed out. This can be useful for awk scripts, or e.g. the 'sort' program.