- 1 Introduction
- 2 Activating SR mode
- 3 Issues with SR mode
- 4 Refactoring rendering
- 5 Hardware Accelerated SR Mode
- 6 ViewerWindow::CanSkipExternalRenderRequest Logic
- 7 The Waiting for parallel rendering... cue
- 8 Multiple Engines in SR Mode
VisIt's scalable rendering (SR) mode is an important feature for handling large (extreme scale) datasets. Ordinarily, in non-SR mode, the engine generates geometry (polygons, lines, and/or points) that is shipped to the viewer to be rendered. In SR mode, all rendering is done in parallel in VisIt's engine and pixels are shipped from the engine to the viewer.
In the sections that follow, we describe some relevant background information and the most elementary scenario for how SR mode works. In particular, we detail how automatic transitions into and out of SR mode work. Following a description of the background and elementary scenario, we then describe a number of complicating factors and issues related to SR mode.
Why Scalable Rendering is Essential
With a large enough input dataset and a large number of processors, the single processor viewer can get overwhelmed with geometry from the engine. In fact, there are a few different problems we might encounter as input problem size is scaled up. The first might be that although the viewer has sufficient memory to accept and store all the geometry from the engine, the time to render it on a single CPU/GPU would be prohibitively long. The next problem might be that no single processor on the engine has enough memory to gather geometry from all the other processors and combine it to be shipped to the viewer. Third, there could simply not be enough memory in the viewer to accept all the geometry the engine produces. Regardless, all these issues are avoided by switching into SR mode.
avtPlot and avtActor objects
To understand how SR mode works, a few key features of the avtPlot object are important to know first.
From the point of view of VisIt's GUI, an avtPlot object in VisIt represents an entry in the plot list in the main GUI panel. Internally, an avtPlot object in VisIt lives partly on the viewer and partly on the engine. Typically (for small scale data) the part of an avtPlot object that lives in the viewer is all stuff that gets rendered by the Graphics Processing Unit (GPU): polygons, line segments, points, as well as all the information such as color maps, transparency and camera necessary to render the plot the way the user wants it. The part of the plot object that lives on the engine is everything else, like the VTK pipeline necessary to generate the data and whatever operators may have been added to that pipeline by the user through VisIt's operator menus.
In ordinary (non-SR) operation, when VisIt executes a plot, the engine does work to generate geometry to be sent to the viewer. When the viewer gets geometry from the engine, it does more execution on it, applying mappers, for example, before sending it to the GPU for rendering. So in this sense, there are two kinds of execution an avtPlot object does: the engine part to generate geometry and the viewer part to set up or apply visual attributes and finally render it.
In SR mode, both kinds of avtPlot's execution (the engine's portion and the viewer's portion) are in fact performed on the engine. Each processor on the engine not only generates geometry but also renders that geometry into pixels. Typically, software rendering is used on each processor. However, on platforms that support graphics hardware at each processor, hardware rendering can be used. The latter is an advanced topic described in a later section.
An avtPlot object is designed to support multiple time states of the same plot. Each time state is stored with the avtPlot object as an avtActor. So, a single avtPlot object is actually a sequence of avtActor objects, one for each time state. By default, only one of the avtActor objects for an avtPlot is populated at any given moment. However, when VisIt's animation caching feature is turned on, as a user displays different time states, each avtActor object associated with each time state winds up getting populated with geometry from the engine. In other words, Visit caches the geometry for each time state for the plot. Once all avtActor objects have been populated with geometry, the viewer can then render the sequence quickly without having to request more geometry from the engine. But this happens only when animation caching is turned on. By default, animation caching is NOT turned on, and only one avtActor object in an avtPlot object in the viewer is populated with geometry at any given moment.
The avtExternallyRenderedImagesActor (ERIA) object
VisIt's ViewerWindow object is built on top of a more fundamental abstraction in AVT called VisWindow. In turn the AVT VisWindow object is built on top of VTK. Likewise, avtActor objects are built on top of vtkActor objects. Each VisWindow has a few special purpose actors. One is the avtTransparencyActor which is serves as a sort-of collector for all transparent geometry in any VisWindow instance. Another is the avtExternallyRenderedImages actor which serves to manage all images that are rendered external to the VisWindow such as those rendered in SR mode by the engine. Deep down in VTK, there is a rendering loop which iterates over all actors and renders them. The ERIA sits quietly in a VisWindow and responds to VTK's PrepareForRender message issued from deep down inside its rendering loop. In avtExternallyRenderedImagesActor::PrepareForRender(), the ERIA issues a callback that was registered with it to retrieve the externally rendered image. A ViewerWindow object sets this callback to be ViewerWindow::ExternalRenderCallback. In turn, ViewerWindow::ExternalRenderCallback issues RPCs to the engine to initialize rendering there and then to render and return an image. In this way, when SR mode is turned on, the ERIA is active and each time VTK executes its rendering loop, it has the effect of emmitting external rendering requests that sort-of wormhole their way from deep down in the bowels of VTK's rendering loop all the way through to the engine. The engine responds with a rendered image and the ERIA object is populated with the new pixel image that VTK then properly renders.
Typical Engine <-> Viewer Communication
When the engine is operating in parallel, only one processor on the engine ever communicates via a socket with the viewer. This is the processor of MPI rank zero and is known as the UI process. The UI process is special in that it does all the same work that all other processors do but also does some additional work to manage communication with the viewer. At the beginning of a request from the viewer, the UI process receives requests (RPCs) from the viewer and broadcasts them to all other processors. At the end of a request from the viewer, the UI process also gathers up results, if there are any (not all RPCs from the viewer require a response from the engine) from all other processors, combines them and ships them back to the viewer. This happens in Engine::WriteData()
One important request the viewer makes from the engine is to return the geometry for a plot (ViewerEngineManager::GetDataObjectReader()). When the UI processor on the engine receives this request, it tells all processors, including itself, to execute their pipelines. All processors, including the UI process, do pipeline execution work resulting in geometry. As processors finish, each one enters Engine::WriteData() and sends its geometry data to the UI process. When the UI process enters Engine::WriteData() it goes into a loop to receive polygon data from each processor. Each iteration through the loop it receives data from any (e.g. MPI_ANY_SOURCE) processor that has sent some and merges this data into a combined result. The UI process continues to iterate until it has received data (which sometimes can be empty), from all processors. It then ships the combined dataset to the viewer.
This describes a typical viewer/engine communication when the viewer is requesting geometry from the engine and SR mode is not being automatically activated.
Activating SR mode
SR mode can be activated either explicitly by the user or VisIt can decide to enter SR mode automatically. This is specified by the user in the GUI's Options->Rendering pane and the Advanced tab: Always explicitly tells VisIt to use SR mode at all times, Never explicitly tells VisIt to never use SR mode, and Auto tells VisIt to decide whether to use SR mode automatically. By default, VisIt is set to use the Auto mode. To control Auto SR mode, VisIt uses a simple polygon count heuristic. With Auto selected, the user is presented with controls to set a polygon count threshold which, when exceeded, causes VisIt to enter SR mode. By default, this threshold is set at 2 million polygons ('2,000 Kpolys'). The transition into and out of SR mode is a complex process, particularly when it occurs automatically.
Auto SR mode and Serial Engines
VisIt will automatically go into SR mode only with parallel engines. The rationale is that when you have a serial engine, the time it takes to render (typically in software) on the engine and ship the pixels to the viewer is unlikely to be faster than shipping polygons to the viewer and rendering them (typically in hardware) on the viewer. For certain, this is true in non-client/server scenarios because both the viewer and engine processes are running on the same machine. In client/server scenarios, whether it is faster to render on engine and ship pixels to the viewer or ship polygons to the viewer and render there depends on many factors such as performance of the network between engine and viewer, the relative size of polygon data versus pixel data, and the relative speed of the single CPU the engine is using and single CPU the viewer is using. The "remote serial engine" mode is a use case that doesn't come up often enough to merit much of a smart algorithm for automatically switching into SR mode. So for a serial engine, VisIt enters SR mode only when it is explicitly told to either by a) doing an off-screen (non-screen-capture) save via File->Save Window or b) turning it on via Options->Rendering pane, Advanced tab.
Automatic Transitions into and out of SR mode
The algorithm for automated switching into SR mode in parallel is complicated by the fact that when the viewer is asking the engine for data, the viewer doesn't know a priori what the polygon count will be. In fact, a parallel engine doesn't know that it will need to switch into SR mode until after all processors have finished generating their polygons and have begun shipping them to the UI process to be combined and shipped off to the viewer. It is during this last step where the engine is collecting results from each processor and combining them into one über-dataset that it can finally discover if the polygon count is 'too high.' Then, it has to throw away all the polygons it has gathered so far and tell all other processors to stop sending polygons to be gathered. Note, this is all happending while the viewer is still waiting to receive what it expects to be polygons. Once the engine has discovered the SR threshold has been exceeded, stopped gathering polygons, and freed all that it did gather, it returns to the viewer a 'dummied up' object, a 'NULL Data' object with a special 'message' embedded in it. When the viewer sees this message, it finally knows that the SR threshold has been exceeded. At this point, the viewer has to take care of some bookkeeping of its own. For example, if it is caching geometry, all the geometry it has collected needs to be thrown away. Typically, only the most recent 'actor' for the last execution of the plot(s) needs to get thrown away on the viewer. The operation is TransmutePlots in ViewerWindow. After this bookkeeping, the viewer can turn around and ask the engine for data again. Only this time it will ask the engine to render the polygons (e.g. go into SR mode) and will be expecting pixels (not polygons) back. The engine will not do any additional pipeline execution because all the data is still there at the end of the pipelines. It just renders that data on each processor into an image that is the same size on every processor and equal to the image size of the viewer (sort-last rendering). Then, an MPI_reduce operation is used to z-buffer composite the whole image (avtWholeImageCompositor) producing the final result on the UI process. The UI process then sends the image back to the viewer and the switch into SR has been completed.
The reverse, switching out of SR mode, is similarly complicated. Also, to avoid situations where we are constantly switching into and out of SR mode because the user is performing operations that constantly cross back and forth across the polygon threshold, there is some hysteresis added to the threshold each time it is crossed making it a bit harder to cross it in the other direction later.
Issues with SR mode
Some issues with automated SR mode are that the rendering threshold is on a per window basis but should really be over all windows. So, if you have four windows each a few polygons below threshold, you can wind up running the viewer at about 4x the polygon count you wanted to. But it's more difficult to force all windows into SR mode due to activity in any one of them. The threshold is also a 'polygon count' when it should really be total memory usage or some other metric. Glyphed plots, like the Vector or Tensor plot, wind up rendering many polygons per glyph. You need to account for this by the 'cellCountMultiplier' in the plot. I think some glyphed plots may fail to account for this. For example, I am not sure how the Label plot deals with this. And some plots have special renderers for which the polygon count heuristic simply does not make sense at all.
Another issue with SR is that it should really be implemented by using some of the pieces that are currently in the viewer. In other words, the viewer needs to get re-factored a bit such that the low-level plot management issues such as the Mesh plot's opaqueness is handled down in a class that both the engine and viewer share so that we don't run into cases where logic gets added to the viewer to handle some special plot management issue and then not also added to the engine's SR logic so that the plots fail to behave same in SR mode. This has come up on occasion.
Another issue with SR is that after a few re-renders due to changes in view, it should really re-balance itself so that rendering work is about equal on all processors. It does not do this now. And I don't think it should in cases where re-executions are the dominant reason the plot is changing. However, after about 3-5 re-renders of the same plot data without a re-execute and if the balance is way out of kilter, the engine should decide to re-balance the polygons to improve render performance. Since that is likely to be costly, it should do so only if the performance to be realized in doing so is worthwhile. When things like transparency or shadows are in play, this gets substantially more complicated. Another thing to consider in re-balancing is the relative cost of rendering versus communication to composite. In certain circumstances (such as keeping view constant but changing visual attributes such as transparency, colors, etc.), it may be the case that shuffling polygons around to achieve an image-space partition is going to be best because the composite can be done much more efficiently.
To some extent the rendering architecture may need to be reconsidered. VisIt's parallel engine architecture was designed with parallel data processing in mind: the memory, I/O bandwidth, and processing power necessary to do data manipulation of extremely large datasets. But that workload is a very different thing than the rendering workload. Other architectures have reaped significant benefits by decoupling data processing from rendering. Similarly, it might be worth considering breaking these two portions of VisIt's job. Having a set of "rendering engines" that are distinct from the "data processing engines" would allow us to allocate resources in a way that is efficient for both types of processing. In addition, it could conceivably allow us to target both sort-last (SR mode) rendering and sort-first (powerwall) rendering.
Hardware Accelerated SR Mode
Many things can happen to cause re-renders. However, not all necessarily require the engine to re-render the image. A good example is simple expose events. In other cases, some VisWindow parameters which do not have any affect on visual appearence may have changed (such as a change in the center of rotation). There is logic in ViewerWindow::CanSkipExternalRenderRequest() to filter out cases where a re-render was issued but the viewer's existing image is the correct one to display. This logic should be consistent with the objects we are attempting to elide via this filter, such as the VisWindow, WindowAttributes, and AnnotationAttributes. If they become out of sync, VisIt may request an image when not required, or fail to request a new image when one is needed. If you find you are encountering this situation after making changes to classes that effect this logic (e.g. WindowAttributes, AnnotationAttributes), then this is a good place to look for the possible cause.
The Waiting for parallel rendering... cue
When SR mode is activated and a render is taking on the order of several seconds or more, as a user makes certain kinds of changes to cause re-renders, it is not always too obvious to the user when the render has completed. For this reason, the ERIA will automatically decide to display a Waiting for parallel rendering... message in the viewer window to help cue the user to know when the render has completed. To affect this behavior, VisIt maintains a running average of the time required to complete the last several render requests. When this average time exceeds some threshold, 3 seconds I think, the Waiting for parallel rendering... queue will be displayed while the viewer is waiting for the rendered image from the engine. Once the viewer has received the rendered image, it is displayed and the Waiting for parallel rendering... message goes away.
Multiple Engines in SR Mode
VisIt supports multiple engines simultaneously. If in any window in the viewer, SR mode is on, then all engine's producing data for display in that window must be in SR mode. Now, VisIt does nothing to guarentee this condition. In fact, it probably won't even warn the user if it is violated. But, if it is, then in a single viewer window, one engine could be producing geometry to be rendered by the viewer and another could be rendering an image. Now, in theory, it should be possible to render a geometry-based actor in the same window with a pre-rendered (image-based) actor so long as the image also carries with it z-buffer information. However, I don't think we have much experience with this in VTK in practice and it probably won't work at least not without some more coding attention. It is probably even possible that VisIt would core in this condition. However, VisIt will correctly display the contents of a window in which data from multiple engines is displayed if all engines are either in or NOT in SR mode. Ordinarily, the rendered image shipped by the engine back to the viewer does NOT contain z-buffer data. However, when multiple engines are in SR mode, each will send z-buffer data along with the pixel data and all will get composited together by the viewer. This has been demonstrated to work correctly.
However, VisIt is not necessarily smart about coordinating the rendering requests of multiple engines. The viewer winds up serializing them. That is, it issues a request to one engine and waits for its response before moving on to request an image from the next engine.
Highly specialized plots, like the Spreadsheet Plot, may include their own mapper and/or renderer. But, most plot objects use the same mapper and renderer.
For small scale datasets, VisIt's engine computes geometry and ships it to the viewer to be rendered there. In this case, changes to the appearence of the plot such as colors, transparency, data mins and maxs used to map data values to color, etc. are all handled in the viewer with no . For large datasets, VisIt's engine may be running on hundreds to thousands of processors
So, for a serial engine, the engine enters SR mode only when it is explicitly told to either by a) doing an off-screen (non-screen-capture) save or b) turning on SR mode in the Rendering controls.