Load Balancing

Overview

VisIt sets up identical data flow networks on every processor. The only thing that distinguishes the networks is which portion of the larger data set they operate. It is the job of the LoadBalancer to tell each processor what this portion is.

Domain Overloading

Typically, VisIt leverages the domain decomposition of the physics code for its own parallelization model. The number of processors that VisIt's parallel server is run on is typically much less than the number of domains the simulation code produced. So VisIt must support domain overloading, where multiple domains are processed on each processor. Note that it is not sufficient to simply combine unrelated domains into one larger domain. This is not even possible for some grid types, like rectilinear grids where two grids are likely not neighboring and cannot be combined. And for situations where grids can be combined, like with unstructured grids, additional overhead would be incurred to distinguish which domains the elements in the combined grid originated from, which is important for operations where users want to refer to their elements in their original form (for example, picking on elements).

Techniques for domains overloading

VisIt has two techniques to do domain overloading. One approach, called streaming, will process domains one at a time. In this approach, there is one pipeline execution for each domain. Another approach, called grouping, is to execute the pipeline only once and to have all the domains travel down the pipeline together.

Load Balancing Strategies

VisIt can employ either streaming or grouping when doing its load balancing.

Static Load Balancing

With static load balancing, domains are assigned to the processors at the beginning of the pipeline execution and a grouping strategy is applied. Because all of the data is available for any stage of the pipeline, collective communication can take place, enabling algorithms like streamline generation that cannot be efficiently implemented in an out-of-core setting.

Dynamic Load Balancing

With dynamic load balancing, domains are assigned dynamically and a streaming strategy is applied. Not all domains take the same amount of time to process; dynamic load balancing efficiently (and dynamically) schedules these domains, creating an evenly distributed load. In addition, this strategy will process one domain in entirety before moving on to the next one, increasing cache coherency. However, because the data streams through the pipeline, it is not all available at one time and collective communication cannot take place with dynamic load balancing.

For VisIt's dynamic load balancing implementation, one processor is designated as a master and the rest as slaves. The master does no work and instead assigns work to slaves as they request it.

What VisIt actually does

So how should VisIt decide which load balancing method to use? Dynamic load balancing is more efficient when the amount of work per domain varies greatly, but the technique does not support all algorithms. Static load balancing is usually less efficient, but does support all algorithms. The best solution is to use dynamic load balancing when possible, but fall back on static load balancing when an algorithm can not be implemented in a streaming setting. However, in a practical setting, dynamic load balancing is frequently not useful. Because dynamic load balancing throws out intermediate results, there is no query-able data. You cannot do picks or other queries. For this reason, dynamic load balancing has been turned off by default and static load balancing is the only method used.

You can turn dynamic load balancing on as a command line feature: -allowdynamic. Note that most of the regression tests pass when run with -allowdynamic, but some do fail and this mode is considered "under development". Also, note that algorithms that require collective communication will still revert to static load balancing.

Command line options for load balancing

       -lb-block            Assign the first D/P domains to processor 0, the
                            next D/P domains to processor 1, etc.
       -lb-stride           Assign every Pth domain starting from the first
                            to processor 0, every Pth domain starting from the
                            second to processor 1, etc.
       -lb-absolute         Assign domains by absolute domain number % P. This
                            guarentees a given domain is always processed
                            by the same processor but can also lead to poor
                            balance when only a subset of domains is selected.
       -lb-random           Randomly assign domains to processors.
       -allowdynamic        Dedicate one processor to spreading the work
                            dynamically among the other processors.  This mode
                            has limitations in the types of queries it can
                            perform.  Under development.