Zoltan Users's Guide  |  Next  |  Previous

Distributed Directory Utility

A distributed directory may be viewed as a distributed hash table pointing to the information stored in the directory. An application may use this directory utility to manage its objects' locations after data migrations or to make information globally accessable. A distributed directory balances the load (in terms of memory and processing time) and avoids the bottle neck of a centralized directory design.

This distributed directory module may be used alone or in conjunction with Zoltan's load balancing capability and memory and communication services. The user should note that external names (subroutines, etc.) prefaced by Zoltan_DD_ are reserved when using this module. Since the distributed directory uses collective communication, it is important that all processors call the same function at the same time in their processing.

The user initially creates an empty distributed directory using Zoltan_DD_Create. Then each global ID (GID), which are the directory keys, together with other optional information is added to the directory using Zoltan_DD_Update. The directory maintains the GID's basic information: local ID (optional), part (optional), arbitrary user data (optional), and the current data owner (optional). Zoltan_DD_Update is also called after data migration or whenever it is useful to update the information in the directory. Zoltan_DD_Find returns the directory information for a list of GIDs. A selected list of GIDs may be removed from the directory by Zoltan_DD_Remove. When the user has finished using the directory, its memory is returned to the system by Zoltan_DD_Destroy.

An object is known by its GID. Hashing provides very fast lookup for the information associated with a GID in a two step process. The first hash of the GID yields the processor number owning the directory entry for that GID. The directory entry owner remains constant even if the object associated with the GID migrates or changes over time. Second, a different hash algorithm on the GID looks up the associated information in directory processor's hash table. The user may optionally register their own (first) hash function to take advantage of their knowledge of their GID naming scheme and the GID's neighboring processors. See the documentation for Zoltan_DD_Set_Hash_Fn for more information. If no user hash function is registered, Zoltan's Zoltan_Hash will be used. This module's design was strongly influenced by the paper "Communication Support for Adaptive Computation" by Pinar and Hendrickson.

Some users number their GIDs by giving the first "n" GIDs to processor 0, the next "n" GIDs to processor 1, and so forth. The function Zoltan_DD_Set_Neighbor_Hash_Fn1 will provide efficient directory communication when these GIDs stay close to their origin. The function Zoltan_DD_Set_Neighbor_Hash_Fn2 allows the specification of ranges of GIDs to each processor for more flexibility. The source code for DD_Set_Neighbor_Hash_Fn1 and DD_Set_Neighbor_Hash_Fn2 provide examples of how a user can create their own "hash" functions taking advantage of their own GID naming convention.

The routine Zoltan_DD_Print will print the contents of the directory. The companion routine Zoltan_DD_Stats prints out a summary of the hash table size, number of linked lists, and the length of the longest linked list. This may be useful when the user creates their own hash functions.

All modules use the following response to the debug_level:
debug_level=0, Output is silent except for FATAL or MEMERR errors.
debug_level=1, Turns on checking for legal, but possibly wrong conditions such as updating the same directory multiple times in one update cycle.
debug_level=5, Adds tracing information for the routines defined below.
debug_level=6, Adds tracing information for all DD routines.
debug_level=7, Adds tracing within each routine,
debug_level>7, Adds information about each object when used.

Calling DD_Stats or DD_Print is automatically verbose independent of the debug_level.

The C++ interface to this utility is defined in the header file zoltan_dd_cpp.h as the class Zoltan_DD. A single Zoltan_DD object represents a distributed directory.

A Fortran90 interface is not yet available.

Source code location: Utilities/DDirectory
C Function prototypes file: Utilities/DDirectory/zoltan_dd.h
C++ class definition: Utilities/DDirectory/zoltan_dd_cpp.h
Library name: libzoltan_dd.a
Other libraries used by this library: libmpi.a, libzoltan_mem.a, libzoltan_comm.a
Zoltan_DD_Create:  Allocates memory and initializes the directory.
Zoltan_DD_Copy:  Allocates a new directory structure and copies an existing one to it.
Zoltan_DD_Copy_To:  Copies one directory structure to another.
Zoltan_DD_Destroy:  Terminate the directory and frees its memory.
Zoltan_DD_Update:  Adds or updates GIDs' directory information.
Zoltan_DD_Find:  Returns GIDs' information (owner, local ID, etc.)
Zoltan_DD_Remove:  Eliminates selected GIDs from the directory.
Zoltan_DD_Stats:  Provides statistics about hash table & linked lists.
Zoltan_DD_Print:  Displays the contents (GIDs, etc) of each directory.
Zoltan_DD_Set_Hash_Fn:  Registers a user's optional hash function.
Zoltan_DD_Set_Neighbor_Hash_Fn1:  Hash function with constant number of GIDs per processor.
Zoltan_DD_Set_Neighbor_Hash_Fn2:  Hash function with variable number of GID's per processor.
Data Stuctures:  
struct Zoltan_DD_Struct: State & storage used by all DD routines. Users should not modify any internal values in this structure. Users should only pass the address of this structure to the other routines in this package.

C: int Zoltan_DD_Create (struct Zoltan_DD_Struct **dd, MPI_Comm comm, int num_gid_entries, int num_lid_entries, int user_length, int table_length, int debug_level);
C++: Zoltan_DD( const MPI_Comm & comm, const int & num_gid_entries, const int & num_lid_entries, const int & user_length, const int & table_length, const int & debug_level);
Zoltan_DD::Create( const MPI_Comm & comm, const int & num_gid_entries, const int & num_lid_entries, const int & user_length, const int & table_length, const int & debug_level);

Zoltan_DD_Create allocates and initializes memory for the Zoltan_DD_Struct structure. It must be called before any other distributed directory routines. MPI must be initialized prior to calling this routine.

The Zoltan_DD_Struct must be passed to all other distributed directory routines. The MPI Comm argument designates the processors used for the distributed directory. The MPI Comm argument is duplicated and stored for later use. The length of the GID, length of the LID, and the length of the optional user data (user_length) must be consistent for all processors.

   dd Structure maintains directory state and hash table.
   comm MPI comm duplicated and stored specifying directory processors.
   num_gid_entries Length (number of ZOLTAN_ID_TYPE) of GID.
   num_lid_entries Length (number of ZOLTAN_ID_TYPE) of local ID or zero to ignore local IDs.
   user_length Length (number of char) of user defined data field (optional, may be zero).
   table_length Length of hash table (zero forces default value of 100,000 slots). For large problems, this value should be increased to approximately the number of global GIDs / number of processors (if you have enough memory) in order to improve performance.
   debug_level Legal values range in [0,9]. Sets the output response to various error conditions where 9 is the most verbose.
Returned Value:
   int Error code.

ZOLTAN_FATAL is returned for MPI problems or if num_gid_entries, num_lid_entries, or user_length do not match globally.
ZOLTAN_MEMERR is returned if sufficient memory can not be allocated.
ZOLTAN_OK is the normal return value.

In the C++ interface, the distributed directory is represented by a Zoltan_DD object. It is created when the Zoltan_DD constructor executes. There are two constructors. The first one listed above uses parameters to initialize the distributed directory. The second constructor does not, but it can subsequently be initialized with a call to Zoltan_DD::Create().

C: struct Zoltan_DD_Struct   *Zoltan_DD_Copy (struct Zoltan_DD_Struct *from);
C++: Zoltan_DD(const Zoltan_DD &dd);

This routine creates a new distributed directory structure and copies an existing one to it. The corresponding routine in the C++ library is the Zoltan_DD copy constructor.
   from The existing directory structure which will be copied to the new one.
Returned Value:
   struct Zoltan_DD_Struct * The newly created directory structure.

C: int Zoltan_DD_Copy_To (struct Zoltan_DD_Struct **to, struct Zoltan_DD_Struct *from);
C++: Zoltan_DD & operator=(const Zoltan_DD &dd);

This routine copies one distributed directory structure to another. The corresponding method in the C++ library is the Zoltan_DD class copy operator.
   to A pointer to a pointer to the target structure. The structure will be destroyed and the pointer set to NULL before proceeding with the copy.
   from A pointer to the source structure. The contents of this structure will be copied to the target structure.
Returned Value:
   int Error code.

C: void Zoltan_DD_Destroy (struct Zoltan_DD_Struct **dd);
C++: ~Zoltan_DD();

This routine frees all memory allocated for the distributed directory. No calls to any distributed directory functions using this Zoltan_DD_Struct are permitted after calling this routine. MPI is necessary for this routine only to free the previously saved MPI comm.
   dd Directory structure to be deallocated.
Returned Value:
   void NONE

There is no explicit Destroy method in the C++ Zoltan_DD class. The object is deallocated when its destructor is called.

int Zoltan_DD_Update (struct Zoltan_DD_Struct *dd, ZOLTAN_ID_PTR gid, ZOLTAN_ID_PTR lid, char *user, int *part, int count);
C++: int Zoltan_DD::Update( ZOLTAN_ID_PTR gid, ZOLTAN_ID_PTR lid, char *user, int *part, const int & count);

Zoltan_DD_Update takes a list of GIDs and corresponding lists of optional local IDs, optional user data, and optional parts. This routine updates the information for existing directory entries or creates a new entry (filled with given data) if a GID is not found. NULL lists should be passed for optional arguments not desired. This function should be called initially and whenever objects are migrated to keep the distributed directory current.

The user can set the debug level argument in Zoltan_DD_Create to determine the module's response to multiple updates for any GID within one update cycle.
   dd Distributed directory structure state information.
   gid List of GIDs to update (in).
   lid List of corresponding local IDs (optional) (in).
   user List of corresponding user data (optional) (in).
   part List of corresponding parts (optional) (in).
   count Number of GIDs in update list.
Returned Value:
   int Error code.

int Zoltan_DD_Find (struct Zoltan_DD_Struct *dd, ZOLTAN_ID_PTR gid, ZOLTAN_ID_PTR lid, char *data, int *part, int count, int *owner);
C++: int Zoltan_DD::Find( ZOLTAN_ID_PTR gid, ZOLTAN_ID_PTR lid, char *data, int *part, const int & count, int *owner) const;

Given a list of GIDs, Zoltan_DD_Find returns corresponding lists of the GIDs' owners, local IDs, parts, data owners, and optional user data. NULL lists must be provided for optional information not being used.
   dd Distributed directory structure state information.
   gid List of GIDs whose information is requested.
   lid Corresponding list of local IDs (optional) (out).
   data Corresponding list of user data (optional) (out).
   part Corresponding list of parts (optional) (out).
   count Count of GIDs in above list.
   owner Corresponding list of data owners (out).
Returned Value:
   int Error code.

ZOLTAN_OK is the normal return.
ZOLTAN_WARN is returned when at least one GID in the gid list is not found AND debug level > 0.
ZOLTAN_MEMERR is returned whenever memory can not be allocated.
ZOLTAN_FATAL is returned whenever there is a problem with the input arguments (such as dd being NULL) or communications error.

int Zoltan_DD_Remove (struct Zoltan_DD_Struct *dd, ZOLTAN_ID_PTR gid, int count);
C++: int Zoltan_DD::Remove( ZOLTAN_ID_PTR gid, const int & count);

Zoltan_DD_Remove takes a list of GIDs and removes all of their information from the distributed directory.
   dd Distributed directory structure state information.
   gid List of GIDs to eliminate from the directory.
   count Number of GIDs to be removed.
Returned Value:
   int Error code.

void Zoltan_DD_Set_Hash_Fn (struct Zoltan_DD_Struct *dd, unsigned int (*hash) (ZOLTAN_ID_PTR, int, unsigned int));
C++: void Zoltan_DD::Set_Hash_Fn( unsigned int (*hash) (ZOLTAN_ID_PTR, int, unsigned int));

Enables the user to register a new hash function for the distributed directory. (If this routine is not called, the default hash function Zoltan_Hash will be used automatically.) This hash function determines which processor maintains the distributed directory entry for a given GID. Inexperienced users do not need this routine.

Experienced users may elect to create their own hash function based on their knowledge of their GID naming scheme. The user's hash function must have calling arguments compatible with Zoltan_Hash. The final argument, nprocs, is the number of processors in the communicator passed to Zoltan_DD_Create. Consider that a user has defined a hash function, myhash, as

          extern int total_num_gid;
          unsigned int myhash(ZOLTAN_ID_PTR gid, int length, unsigned int nproc)
           /* Assuming a processor is more likely to query GIDs that are numerically close to the GIDs it owns, */
           /* this hash function tries to store the gid's directory information near the gid's owning processor's neighborhood. */
           /* GID length is one ; total_num_gid is a global variable with the total number of GIDs in the application. */

                return ((*gid * nproc) / total_num_gid);

Then the call to register this hash function is:
                Zoltan_DD_Set_Hash(dd, myhash);

   dd Distributed directory structure state information.
   hash Name of user's hash function.
Returned Value:
   void NONE

void Zoltan_DD_Stats (struct Zoltan_DD_Struct *dd);
C++: void Zoltan_DD::Stats() const;

This routine prints out summary information about the local distributed directory. It includes the hash table length, number of GIDs stored in the local directory, the number of linked lists, and the length of the longest linked list. The debug level (set by an argument to Zoltan_DD_Create controls this routine's verbosity.
   dd Distributed directory structure for state information
Returned Value:
   void NONE

int Zoltan_DD_Set_Neighbor_Hash_Fn1 (struct Zoltan_DD_Struct *dd, int size);
This routine associates the first size GIDs to proc 0, the next size to proc 1, etc. It assumes the GIDs are consecutive numbers. It assumes that GIDs primarily stay near their original owner. The GID length is assumed to be 1. GIDs outside of the range are evenly distributed among the processors via modulo(number of processors). This is a model for the user to develop their own similar routine.
   dd Distributed directory structure state information.
   size Number of consecutive GIDs associated with a processor.
Returned Value:
   int Error code.

int Zoltan_DD_Set_Neighbor_Hash_Fn2 (struct Zoltan_DD_Struct *dd, int *proc, int *low, int *high, int n);
This routine allows the user to specify a beginning and ending GID "numbers" per directory processor. It assumes that GIDs primarily stay near their original owner. It requires that the numbers of high, low, & proc entries are all n. It assumes the GID length is 1. It is a model for the user to develop their own similar routine. Users should note the registration of a cleanup routine to free local static memory when the distributed directory is destroyed. GIDs outside the range specified by high and low lists are evenly distributed among the processors via modulo (number of processors).
   dd Distributed directory structure state information.
   proc List of processor ids labeling for corresponding high, low value.
   low List of low GID limits corresponding to proc list.
   high List of high GID limits corresponding to proc list.
   n Number of elements in the above lists. Should be number of processors!
Returned Value:
   int Error code.

C: int Zoltan_DD_Print (struct Zoltan_DD_Struct *dd);
C++: int Zoltan_DD::Print () const;

This utility displays (to stdout) the entire contents of the distributed directory at one line per GID.
   dd Distributed directory structure state information.
Returned Value:
   int Error code.

User's Notes

Because Zoltan places no restrictions on the content or length of GIDs, hashing does not guarantee a balanced distribution of objects in the distributed directory. Note also, the worst case behavior of a hash table lookup is very bad (essentially becoming a linear search). Fortunately, the average behavior is very good! The user may specify their own hash function via Zoltan_DD_Set_Hash_Fn to improve performance.

This software module is built on top of the Zoltan Communications functions for efficiency. Improvements to the communications library will automatically benefit the distributed directory.

[Table of Contents  |  Next:  Examples of Zoltan Usage  |  Previous:  Unstructured Communication Utilities  |  Privacy and Security]