Parallel on Windows

This page covers using MPI on Windows, leading up to building parallel VisIt on Windows and running it on a Windows HPC cluster.

MPI

MPI stands for Message Passing Interface and it is a library that allows several cooperating programs to communicate with one another while running on distributed memory computers. Microsoft has released a version of MPI for Windows (MS-MPI) that enables Windows software to use MPI, enabling large scale parallelism. Parallel VisIt uses MPI to coordinate among processes.

Getting MPI

You can get the MS-MPI runtime libraries from the following link:

Testing MPI

To test MPI on Windows, you'll need a simple MPI program that you can build in Visual Studio.

Example Program

#include <stdio.h>
#include <string.h>
#include <mpi.h>
 
int
main(int argc, char *argv[])
{
    const char *s = "HELLO FROM THE MASTER PROCESS!";
    int par_rank, par_size;
    FILE *fp = NULL;
    char msgbuf[100], filename[100];
 
    /* Init MPI */
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &par_rank);
    MPI_Comm_size(MPI_COMM_WORLD, &par_size);

    char pname[MPI_MAX_PROCESSOR_NAME];
    int plen = 0;
    memset(pname, 0, sizeof(char)*MPI_MAX_PROCESSOR_NAME);
    MPI_Get_processor_name(pname, &plen);

    printf("%s: Rank %d is part of a %d processor job\n", pname, par_rank, par_size);

    // The following code can trip up MPI if the Windows firewall is 
    // being too restrictive and you are running on multiple nodes.
    // If you get MPI timeout errors then disable the following code
    // and try again.
#if 1
    msgbuf[0] = '\0';
 
    /* Broadcast message from master to all other processors. */
    if(par_rank == 0)
    {
        MPI_Bcast((void*)s, strlen(s)+1, MPI_CHAR, 0, MPI_COMM_WORLD);
        strcpy(msgbuf, s);
    }
    else
        MPI_Bcast((void*)msgbuf, strlen(s)+1, MPI_CHAR, 0, MPI_COMM_WORLD);
 
    /* Write the message from the master to a file. */
    sprintf(filename, "%s.%04d.log", argv[0], par_rank);
    if((fp = fopen(filename, "wt")) != NULL)
    {
        fprintf(fp, "Running %s with %d processors.\n", argv[0], par_size);
        fprintf(fp, "This is the log for processor %d.\n", par_rank);
        fprintf(fp, "Message: \"%s\"\n", msgbuf);
        fclose(fp);
    }
#endif

    /* Finalize MPI */
    MPI_Finalize();
 
    return 0;
}

Building the Example

Open Visual Studio and follow these basic instructions to build the above source code as a program called testmpi.exe that we'll use to test whether MPI works on your system.

  1. Create a new, empty Console application project called testmpi.
  2. Add the testmpi.cpp source code file to the project.
  3. Open the project properties and go to the section for C/C++/General settings. Add C:\Program Files\Microsoft HPC Pack 2008 R2\Inc to the Additional Include Directories setting so the compiler will know where to find the mpi.h header file.
  4. Go to the Linker/General section and add C:\Program Files\Microsoft HPC Pack 2008 R2\Lib\i386 to the Additional Library Directories setting so the linker will know where to look for the MPI library. If you are building a 64-bit program then use the amd64 version of the library instead of the i386 version.
  5. Go to the Linker/Input section and add msmpi.lib to the Additional Dependencies section to let the linker know that your program must link with MPI.
  6. Apply the changes and build your program.

Running the Example

On Windows, the program that runs MPI programs is called mpiexec. In order to run an MPI program, you can run mpiexec and pass the name of your MPI program as a command line option to mpiexec, along with some arguments that tell mpiexec how many processors you will use.

Open a command line shell and cd into the directory where your testmpi.exe program is located. An example would be Documents\testmpi\Debug if you had created your Visual Studio solution in the Documents\testmpi directory.

cd Documents\testmpi\Debug
mpiexec -n 4 testmpi.exe

Parallel VisIt

Parallel VisIt allows you to run VisIt's compute engine using more than one processor so that larger datasets can be processed.

Using a binary version

As of VisIt 2.4.0, VisIt for Windows comes bundled with support for a parallel compute engine. The VisIt installer program will try and detect whether your system has MSMPI installed and if it doesn't then the installer will optionally install MSMPI for you so you may run a parallel version of VisIt. If you install the parallel support then the installer program will also create an additional "VisIt parallel" shortcut in the VisIt program group so you can run in parallel. If you use the VisIt parallel menu option then nothing further should be required for you to run in parallel; you won't need a host profile and so on. VisIt will pass the number of cores on your system to mpiexec when launching the parallel compute engine.

ParallelVisItInStartMenu.png

Building

If you choose to build VisIt from sources, you will want to see the Using CMake with VisIt on Windows page to get acquainted with VisIt's build process on Windows. When building a parallel version of VisIt, you must have previously installed the MSMPI SDK. You must also enable the VISIT_PARALLEL option when you run the CMake gui. Apart from these 2 additional steps, the procedure for building the parallel version of VisIt is the same as for building the serial version, only that more projects are compiled during the parallel build.

Windows HPC

By building a parallel version of VisIt, you can easily run in parallel using more cores on your workstation. It is also possible to connect several computers together over a network and have them work as a compute cluster. Microsoft provides its Windows HPC software to make this possible. When you have a Windows HPC cluster, your computers are managed via a job scheduler that can coordinate the launch and execution of jobs on different nodes of the cluster. When VisIt is installed on each of the compute nodes in the cluster, you can even tell the Windows HPC job scheduler to run VisIt as an MPI job on multiple nodes of the cluster so you can process even larger datasets than you could on a single compute node.

You will need to:

  • Configure your Windows HPC cluster
  • Install VisIt on all of the cluster nodes
  • Create a Host profile for VisIt that tells it how to submit jobs to your cluster.

Support for Windows HPC clusters has been added to the VisIt svn trunk and will appear in VisIt 2.5.0. There is a 64-bit Windows beta version though.

Getting Windows HPC

The Windows HPC client utilities let you interact with a Windows HPC cluster so that you can manage it and submit compute jobs to it. Note that since VisIt can submit to a Windows HPC cluster, the following client utilities are also required in order to build parallel VisIt. The client utilities are available from the following link:

Setting up Windows HPC

Setting up the Windows HPC cluster is beyond the scope of this document and there good resources that show how to create a Windows HPC cluster at:

  • Ensure that firewall settings on all of the nodes will not interfere with MPI making connections from node to node. This means that all nodes in the cluster must be visible to one another.
  • Cluster nodes will need outbound connectivity to your enterprise network since you'll likely be running VisIt client/server from another Windows computer.

Testing the Windows HPC cluster

It is best to test that the Windows HPC cluster works by testing a simple program rather than starting with VisIt. We already applied this approach to make sure that MPI worked on our desktop computer. We're doing the same thing here except that we're making sure that the test MPI program can work across multiple compute nodes on a Windows HPC cluster. This will make sure that job submission and networking work for the cluster.

First, take the testmpi.exe program from before (also take the C++ runtimes, e.g.: msvcp100.dll, msvcr100.dll, msvcr100d.dll ) and put it somewhere on each node of the cluster. A convenient place is to make a utilities folder on the desktop of each compute node. One way to do this is to create a CD or a USB stick that has the programs on it and put it on each machine. You can then use Remote Desktop to log into each machine and drag the utilities folder to the desktop.

Once you have installed all of the software on the cluster nodes, you can run the HPC Job Manager program on your desktop computer. You can use the HPC Job Manager program to connect to your cluster's head node and see the jobs that are running there.

WindowsHPC1.png
Windows HPC Job Manager

You can also use the HPC Job Manager to submit new jobs on your cluster. Simply click the New Job... action in the upper right part of the application. This will open a new job window that you can populate to tell the cluster how to run your new job. There are 5 pages of information that you can fill out. For a simple job such as this test program, you only need to fill out the Job Details and Edit Tasks pages.

The Job Details indicate some basic information about your job and lets you specify the resources that are needed. Fill in a name for the job and also specify some job resources. One way to allocate resources to your job is to request a certain number of cores. For example, if your compute nodes have 8 cores, you could request 16 cores to make sure that you get 2 nodes. There are many ways to request resources in this job scheduler.

WindowsHPC2.png
Windows HPC Job Manager - Job Details

Once you have filled in the job details, you need to tell the job manager what you want it to execute. You do this by adding tasks to your job. In this case, we'll add a single task but it will be the mpiexec program and it will run multiple instances of the testmpi.exe program.

WindowsHPC3.png
Windows HPC Job Manager - Tasks

Once you are satisfied with your task and job information, you can click the Submit button to submit the job to the Job Manager so the job will be run.

WindowsHPC4.png
Windows HPC Job Manager - Submitted job

Setting up VisIt for Windows HPC

Once you have your Windows HPC cluster operational and you have installed VisIt on each of its nodes (in the same location) you are nearly ready to run VisIt on the cluster. The final remaining piece of the puzzle is to create a new host profile for your cluster so that your desktop computer knows how to submit jobs to the cluster. A host profile tells VisIt how to access a remote computer and how to submit its parallel compute engine on the remote computer. Normally, VisIt runs an external secure shell program, passing the command line of the remote program that will be executed. For a Windows HPC cluster, VisIt talks directly to the job scheduler in order to submit the job.

Run VisIt and open the Options->Host Profiles window.

WindowsHPCHostProfile1.png
Adding a host profile for the Windows cluster
  1. Click the New Host button to create a new host for your Windows HPC cluster.
  2. Set the Host nickname to a name that describes your cluster
  3. Set the Remote host name to the hostname of the cluster's head node.
  4. Set the Host name aliases to any other names that might be used to access the cluster's head node.
  5. Provide the Path to VisIt installation, which is the path to VisIt on the nodes of the cluster.
  6. Set the Username to your login name on the cluster.
  7. In the Connection settings, click the Share batch job with Metadata Server check box. This is mandatory for submitting on a Windows HPC cluster.

Now that you have configured the basic machine settings for the cluster, you'll need to create a launch profile for the cluster, which tells VisIt how to launch its compute engine component on the cluster.

WindowsHPCHostProfile2.png
Adding a parallel launch profile for the host profile.
  1. Click the New Profile button to create a new launch profile.
  2. On the Parallel settings tab, indicate that the compute engine will be parallel by turning on the Launch parallel engine check box.
  3. Turn on the Parallel launch method check box and specify WindowsHPC as the launch method.
  4. Provide a default number of processors. Choose a value that is most appropriate for your cluster usage.
  5. Click the Apply button and dismiss the window.

Using the Windows HPC cluster

Now that your cluster is up and running and you have created a VisIt host profile for the cluster, it is time to try processing data using VisIt on your cluster. As with all client/server visualization in VisIt, accessing remote data begins with the File Open dialog. Select the name of your cluster from the Host combo box in the File Open dialog to initiate a connection to your cluster. Since you have created a parallel launch profile for your cluster, VisIt will open the Select options dialog that lets you select among different launch profiles or change the number of processors that you want to use. Once you make your selection and click the Ok button, VisIt will initiate the connection to the cluster to submit the job. A this point, you will be prompted for your cluster login and password. Upon successful authentication, the job will be submitted to the cluster. You can further inspect the job properties using the HPC Job Manager program. Once VisIt starts running remotely, it will connect back to the local copy of VisIt over the network and you will be able to browse the remote file system. Open a file and start visualizing data. You're running in parallel!

WindowsHPCConnectingCluster.png

Pitfalls:

  • Your local computer must be accessible on the network by cluster compute nodes.
  • VisIt on your local computer will open ports ranging from 5600-5604 and the remote compute engine from the cluster will attempt to connect back to those ports using a socket. Your local computer's firewall must not block the connection or the remote compute engine will never connect back. You'll have to click the cancel button in the VisIt Connection Progress dialog if this happens to you. If this does happen to you then you'll need to open up those ports in the firewall.
  • If your compute node firewalls are too restrictive then they can interfere with cross-node MPI socket connections. If you run into this then the testmpi.exe program will fail during the MPI_Bcast. A workaround is to request only the amount of cores that will let you run on 1 node.

Making sure that parallel works

You can test whether the compute engine is really running in parallel by opening a multi-domain dataset and plotting the processor rank on each of the domains. When VisIt runs in parallel and processes a multi-domain dataset, each processor gets some number of domains to process. The results are assembled at the end.

  1. Open multi_ucd3d.silo, this is a multi-domain example file that is bundled with VisIt.
  2. Create a new expression: pid = procid(mesh1) in the Expression window.
  3. Create a Pseudocolor plot of pid and click the Draw button.

If you get a picture that looks like a patchwork of different colors then your compute engine is running in parallel. If you get a single, solid color then something is wrong or you opened a dataset that has just a single domain. Another good way to tell whether the compute engine is parallel is to open the File->Compute Engines window.

WindowsHPCProcIdExpression.png