Managing open file descriptors among readers and writers

For large scale datasets, the input data is often decomposed into a, possibly large, number of files. This is especially true for data stored in formats that are not really designed from top to bottom for large scale (rich man's) parallel I/O. Generally, the operating system imposes an upper limit on the total number of open file descriptors a process can have. On many Linux/Unix systems, this limit can be queried at the Bourne Shell prompt with the commands

   $ ulimit -Hn
   $ ulimit -Sn

...to obtain the hard and soft limits, respectively, for number of open file descriptors. On most modern Linux/Unix systems the numbers of supported open file descriptors is on the order of 1000.

These two situations combine to create a need for VisIt to do some work to manage the number of file descriptors it keeps open at any one time.

In the current implementation of this feature in VisIt, there appear to be a variety of issues. These are enumerated here but somewhat lacking in detail

  • There is a class, avtFileDescriptorManager, that appears to have been designed with this in mind.
    • As an aside, this class is currently in src/avt/Database/Database but should probably really be in src/avt/Database/Formats
    • There is a method, SetMaximumNumberOfOpenFiles() for this class but it is never called anywhere. So, the class's default is 20 open files, significantly lower than most system's capabilities. This may imply that VisIt is closing files (and re-opening) them far more frequently that it really needs to. Closing files isn't so much the problem as possibly having to re-open them.
    • This class implements RegisterFile(), UnregisterFile(), UsedFile() methods for a plugin to tell the manager about at file it has started using, tell the manager about a file it is no longer using, tell the manager about a file it is currently using.
      • When the manager reaches its limit (default of 20), a new call to RegisterFile() will first close the least recently used file (more on that later) and there is where unnecessary closing of files due to artificially low open descriptor count becomes an issue
    • That said, avtFileDescriptorManager just assumes that only one file descriptor is used by the underlying database format plugin per filename. It is possible for an external library to use multiple file descriptors for a single filename (consider for example using HDF5 library with split file driver -- which we don't currently support but could easily encounter a need to). However, avtFileDescriptorManager cannot possibly know this for sure.
  • avtFileDescriptorManager is a singleton (as it should be because the file descriptor limit is per process -- so over 'all' currently opened plugin's files) and is used in avtFileFormat.
  • avtFileFormat defines wrapper methods RegisterFile(), UnregisterFile(), UsedFile() that turn around and forward requests to avtFileFormatDescriptorManager. But, only a few database plugins even use these methods (Silo, SAMRAI, Vista, and VisItXdmf. That means few of VisIt's plugins are really even playing in the right game to properly manager file descriptors and there is nothing about the avtFileFormat classes that enforcess that they do (more on that later).
    • Part of the process of registering a file with avtFileFormat is that avtFileFormat turns around and defines a callback function that is passed to avtFileDescriptorManager on behalf of any plugin. When avtFileDescriptorManager decides too many descriptors are open and it needs to close one, it winds up calling this callback which in turn calls avtFileFormat's CloseFileDescriptor method and it is this method that turns around and calls the actual plugin's CloseFile method.
  • But wait, down in the avtSTMDFileFormat and avtMTSDFileFormat classes, there is an AddFile method. avtSTMDFileFormat defines a MAX_FILES constant at 20 and avtMTSDFileFormat defines a MAX_FILES constant of 1000. Both classes implement an AddFile method which. However, avtMTSDFileFormat's implementation throws an exception if the max is reached while avtSTMDFileFormat will find the least recently used file and call CloseFile.
  • If an STMD plugin uses AddFile but fails to implement CloseFile, if the maximum number of open descriptors (20 for STMD formats) is reached, it will throw an exception because it will wind up in the default implementation of CloseFile in avtFileFormat which simply throws an exception.
  • So there appears to be competing functionality for managing open file descriptors. One implementation in avtFileDescriptorManager which few plugins use. One implementation (maxed at 20) implemented with AddFile in avtSTMDFileFormat.C which attempts to close open files when max is reached. One implimentation with AddFile in avtMTSDFileFormat.C which throws exception when max is reached.
    • In some cases, developers have taken to managing this issue within a given database plugin itself (e.g. Gadget reader fixes due to file open in avtFileFormat constructor). While this make work to address the use case where VisIt is only ever using a single database plugin, it doesn't work in general because file descriptor resources are global to the executable.
  • Note that neither STSD or MTMD implement a similar AddFile functionality. One would think STSD would be in most need of it as that is a kind of plugin most likely to have to consume a lot of file descriptors at scale (maybe this kind of plugin is so unsuitable for large scale data anyways, we never encounter the need in practice).
  • Finally, no writers do anything to manage open file descriptors. Writers should not be as problematic because it is most frequently true writing files does not involve a need to keep any one file open for future, unknown, writes solely to avoid the possible overhead of closing and then re-opening.

Note there are some similar issues when using an I/O library that is common to many plugins like HDF5. Some HDF5 resources are specific to the process itself and not any particular currently opened file. If a plugin happens to terminate the HDF5 library with a call to H5Fclose it can wind up pulling it out from underneath another HDF5 plugin. This problem was recognized and fixed several years ago but it nonetheless raises the issue that apart from system level resources like open file descriptos, we can have library-specific resources that require management in a similar way. HDF5 is just a very good example of that. At a minimum, HDF5 based plugins should garbage collect when they are done using the HDF5 library (e.g. the count of plugin objects reduces to zero). This is implemented in the SAMRAI plugin but nowhere else.


Re-design Idea #1

VisIt can use system calls to periodically (perhaps upon every engine Execute operation) count and maintain track of all open fds as well as most recent access (e.g. read) of an fd. It is not clear how portable the two functionalities (counting open file descriptors, obtaining last access time) could be.

When that count exceeds 90% of system defined limit, VisIt can find the least recently accessed file (st_atime member from fstat) and tell a plugin to close it.

In general, VisIt would know only about fds and the plugin would know only about file handles so we need a way to map from fds to file handles (and maybe the reverse too) keeping in mind that one plugin's file handle could really be a vector of fds.

The plugin would have to give VisIt handles to the files and/or maybe register callback methods that a non-plugin specific fd manager could then issue a call to close a file and pass the callback a void* to the file handle that the plugin can then use to close that file.

But, that means VisIt is potentially closing file handles out from underneath plugins and plugins have to be coded to deal with that possibility (I bet few are presently). This could involve re-visiting 120 some odd database plugins to adjust their logic! Not very attractive.

One big problem with this idea is that there are a lot of situations in which st_atime cannot be relied upon. Mounting a filesystem with noatime option is just one example. Client side caching of NFS reads is another. However, that is not too bad as it really only means that a plugin has to inform the fd manager whenever it uses (e.g. UsedFile()) a given fd (or file handle).

I have tenatively found a way of getting decent relative file access information using the following code....

#include <errno.h>
#include <fcntl.h>
#include <sys/resource.h>
#include <sys/stat.h>
#include <sys/time.h>
#include <sys/types.h>
#include <time.h>
#include <unistd.h>
static std::map<int,off_t> fdmap;
static int CountOpenFileDescriptors(void)
{
    struct rlimit rlim;
    getrlimit(RLIMIT_NOFILE, &rlim);
    int maxfd = (int) rlim.rlim_cur;
    for (int i = 4; i < maxfd; i++)
    {
        struct stat statbuf;
        errno = 0;
        off_t off = lseek(i, 0, SEEK_CUR);
        if (off == (off_t)-1 || errno == EBADF)
        //if (fstat(i, &statbuf) == -1 || errno == EBADF)
        //if (fcntl(i, F_GETFL) == -1 || errno == EBADF)
            continue;
        if (fdmap.find(i) != fdmap.end() && fdmap[i] != off)
            printf("fd=%d,old_off=%lu,new_off=%lu\n",i,(unsigned long) fdmap[i], off);
        fdmap[i] = off;
    }
    return fdmap.size();
}

This code loops over all possible fds calling lseek on each but actually not changing the file pointer offset. Instead it is using lseek to obtain currentl file offset. By maintaining memory of the last file offset it saw from a given fd, it can determine those fds that have been recently used. The idea is that each time a plugin needs to open a file, the fd manager would do this work to figure out total open fd count as well as fds that have recently been used in order to help identify candidate LRU fds for closure should the maximum limit be reached.

Pseudocode Solution

  1. Every plugin (including all existing ones will have to be adjusted) will define a static function to be used as the Close callback the the
  2. Every plugin will #define CBACK to be the name of the Close callback
  3. For all existing plugins, add a new fdManagerOpenMappingMacros.h header file designed like so...
    • Better yet, stick all this stuff in avtFileFormat.h. As long as it gets included after all any external lib's header file (e.g. hdf5.h), it should work to enforce desired open semantics
#ifdef H5_MAJOR
static hid_t VisIt_H5FOpen(const char *FNAME, int MODE. hid_t PROPS, closeCallback_t cback)
{
    avtFileDesriptorManager::OpenFileStart(cback);
    hid_t *retvalp = (hid_t*) malloc(sizeof(hid_t));
    *retvalp = H5Fopen(FNAME, MODE, PROPS);
    avtFileDescriptorManager::OpenFileComplete(&retvalp);
    return *retvalp;
}
#define H5Fopen(FNAME,MODE,PROPS) VisIt_H5FOpen(FNAME,MODE,PROPS,CBACK)
#endif
#ifdef NC_MAX_FILES
static int VisIt_nc_open(const char *NAME, int MODE, int *RETID, closeCallback_t cback)
{
    avtFileDesriptorManager::OpenFileStart(cback);
    int *retidp = (int*) malloc(sizeof(int));
    int retval = nc_open(NAME, MODE, retidp);
    avtFileDescriptorManager::OpenFileComplete(&retidp);
    return retval;
}
#define nc_open(NAME,MODE,RETID) VisIt_nc_open(NAME,MODE,RETID,CBACK)
#endif
static int VisIt_open(const char *name, int flags, mode_t mode, closeCallback_t cback)
{
    avtFileDesriptorManager::OpenFileStart(cback);
    int *retvalp = new int;
    *retvalp = open(name, flags, mode);
    avtFileDescriptorManager::OpenFileComplete(&retvalp);
    return *retvalp;
}
#define open(NAME, FLAGS, MODE) VisIt_open(NAME, FLAGS, MODE, CBACK)

and include this header file in every existing plugin file that opens files after all other header files. For libs like HDF5 and netCDF, this has the effect of defining macros only if those libs header files are included.

fd manager can determine new fds created between OpenFileStart/OpenFileComplete calls and those fds are attributed to the over-arching open call. However, the plugin also needs to register a callback to close the file.

Re-design Idea #2

Each plugin, when it wants to open a file could check-in with the fd manager to ask if it is ok to open a file (keeping in mind that a call to libGorfo's open method may in fact involve opening more than one fd) and if the fd manager returns a no, the plugin can select a file to close. But, what if the plugin itself has no open files but another plugin does? That won't work!