This distributed directory module may be used alone or in conjunction with Zoltan's load balancing capability and memory and communication services. The user should note that external names (subroutines, etc.) prefaced by Zoltan_DD_ are reserved when using this module. Since the distributed directory uses collective communication, it is important that all processors call the same function at the same time in their processing.
The user initially creates an empty distributed directory using Zoltan_DD_Create. Then each global ID (GID), which are the directory keys, together with other optional information is added to the directory using Zoltan_DD_Update. The directory maintains the GID's basic information: local ID (optional), part (optional), arbitrary user data (optional), and the current data owner (optional). Zoltan_DD_Update is also called after data migration or whenever it is useful to update the information in the directory. Zoltan_DD_Find returns the directory information for a list of GIDs. A selected list of GIDs may be removed from the directory by Zoltan_DD_Remove. When the user has finished using the directory, its memory is returned to the system by Zoltan_DD_Destroy.
An object is known by its GID. Hashing provides very fast lookup for the information associated with a GID in a two step process. The first hash of the GID yields the processor number owning the directory entry for that GID. The directory entry owner remains constant even if the object associated with the GID migrates or changes over time. Second, a different hash algorithm on the GID looks up the associated information in directory processor's hash table. The user may optionally register their own (first) hash function to take advantage of their knowledge of their GID naming scheme and the GID's neighboring processors. See the documentation for Zoltan_DD_Set_Hash_Fn for more information. If no user hash function is registered, Zoltan's Zoltan_Hash will be used. This module's design was strongly influenced by the paper "Communication Support for Adaptive Computation" by Pinar and Hendrickson.
Some users number their GIDs by giving the first "n" GIDs to processor 0, the next "n" GIDs to processor 1, and so forth. The function Zoltan_DD_Set_Neighbor_Hash_Fn1 will provide efficient directory communication when these GIDs stay close to their origin. The function Zoltan_DD_Set_Neighbor_Hash_Fn2 allows the specification of ranges of GIDs to each processor for more flexibility. The source code for DD_Set_Neighbor_Hash_Fn1 and DD_Set_Neighbor_Hash_Fn2 provide examples of how a user can create their own "hash" functions taking advantage of their own GID naming convention.
The routine Zoltan_DD_Print will print the contents of the directory. The companion routine Zoltan_DD_Stats prints out a summary of the hash table size, number of linked lists, and the length of the longest linked list. This may be useful when the user creates their own hash functions.
All modules use the following response to the debug_level:
debug_level=0, Output is silent except for FATAL or MEMERR errors.
debug_level=1, Turns on checking for legal, but possibly wrong conditions such as
updating the same directory multiple times in one update cycle.
debug_level=5, Adds tracing information for the routines defined below.
debug_level=6, Adds tracing information for all DD routines.
debug_level=7, Adds tracing within each routine,
debug_level>7, Adds information about each object when used.
Calling DD_Stats or DD_Print is automatically verbose independent of the debug_level.
The C++ interface to this utility is defined in the header file zoltan_dd_cpp.h as the class Zoltan_DD. A single Zoltan_DD object represents a distributed directory.
A Fortran90 interface is not yet available.
Source code location: | Utilities/DDirectory |
C Function prototypes file: | Utilities/DDirectory/zoltan_dd.h |
C++ class definition: | Utilities/DDirectory/zoltan_dd_cpp.h |
Library name: | libzoltan_dd.a |
Other libraries used by this library: | libmpi.a, libzoltan_mem.a, libzoltan_comm.a |
Routines:Zoltan_DD_Create: Allocates memory and initializes the directory. |
|
Data Stuctures:
struct Zoltan_DD_Struct: State & storage used by all DD routines. Users should not modify any internal values in this structure. Users should only pass the address of this structure to the other routines in this package. |
C: | int Zoltan_DD_Create (struct Zoltan_DD_Struct **dd, MPI_Comm comm, int num_gid_entries, int num_lid_entries, int user_length, int table_length, int debug_level); |
C++: |
Zoltan_DD(
const MPI_Comm & comm,
const int & num_gid_entries,
const int & num_lid_entries,
const int & user_length,
const int & table_length,
const int & debug_level);
or Zoltan_DD(); Zoltan_DD::Create( const MPI_Comm & comm, const int & num_gid_entries, const int & num_lid_entries, const int & user_length, const int & table_length, const int & debug_level); |
The Zoltan_DD_Struct must be passed to all other distributed directory routines. The MPI Comm argument designates the processors used for the distributed directory. The MPI Comm argument is duplicated and stored for later use. The length of the GID, length of the LID, and the length of the optional user data (user_length) must be consistent for all processors.
Arguments: | |
dd | Structure maintains directory state and hash table. |
comm | MPI comm duplicated and stored specifying directory processors. |
num_gid_entries | Length (number of ZOLTAN_ID_TYPE) of GID. |
num_lid_entries | Length (number of ZOLTAN_ID_TYPE) of local ID or zero to ignore local IDs. |
user_length | Length (number of char) of user defined data field (optional, may be zero). |
table_length | Length of hash table (zero forces default value of 100,000 slots). For large problems, this value should be increased to approximately the number of global GIDs / number of processors (if you have enough memory) in order to improve performance. |
debug_level | Legal values range in [0,9]. Sets the output response to various error conditions where 9 is the most verbose. |
Returned Value: | |
int | Error code. |
ZOLTAN_FATAL is returned for MPI problems or if num_gid_entries,
num_lid_entries, or user_length
do
not match globally.
ZOLTAN_MEMERR is returned if sufficient memory can not be allocated.
ZOLTAN_OK is the normal return value.
In the C++ interface, the distributed directory is represented by a Zoltan_DD object. It is created when the Zoltan_DD constructor executes. There are two constructors. The first one listed above uses parameters to initialize the distributed directory. The second constructor does not, but it can subsequently be initialized with a call to Zoltan_DD::Create().
C: | struct Zoltan_DD_Struct   *Zoltan_DD_Copy (struct Zoltan_DD_Struct *from); |
C++: | Zoltan_DD(const Zoltan_DD &dd); |
Arguments: | |
from | The existing directory structure which will be copied to the new one. |
Returned Value: | |
struct Zoltan_DD_Struct * | The newly created directory structure. |
C: | int Zoltan_DD_Copy_To (struct Zoltan_DD_Struct **to, struct Zoltan_DD_Struct *from); |
C++: | Zoltan_DD & operator=(const Zoltan_DD &dd); |
Arguments: | |
to | A pointer to a pointer to the target structure. The structure will be destroyed and the pointer set to NULL before proceeding with the copy. |
from | A pointer to the source structure. The contents of this structure will be copied to the target structure. |
Returned Value: | |
int | Error code. |
C: | void Zoltan_DD_Destroy (struct Zoltan_DD_Struct **dd); |
C++: | ~Zoltan_DD(); |
Arguments: | |
dd | Directory structure to be deallocated. |
Returned Value: | |
void | NONE |
There is no explicit Destroy method in the C++ Zoltan_DD class. The object is deallocated when its destructor is called.
C: |
int Zoltan_DD_Update (struct Zoltan_DD_Struct *dd, ZOLTAN_ID_PTR gid, ZOLTAN_ID_PTR lid, char *user, int *part, int count); |
C++: | int Zoltan_DD::Update( ZOLTAN_ID_PTR gid, ZOLTAN_ID_PTR lid, char *user, int *part, const int & count); |
The user can set the debug level argument in Zoltan_DD_Create
to determine the module's response to multiple updates for any GID
within one update cycle.
Arguments: | |
dd | Distributed directory structure state information. |
gid | List of GIDs to update (in). |
lid | List of corresponding local IDs (optional) (in). |
user | List of corresponding user data (optional) (in). |
part | List of corresponding parts (optional) (in). |
count | Number of GIDs in update list. |
Returned Value: | |
int | Error code. |
C: |
int Zoltan_DD_Find (struct Zoltan_DD_Struct *dd, ZOLTAN_ID_PTR gid, ZOLTAN_ID_PTR lid, char *data, int *part, int count, int *owner); |
C++: | int Zoltan_DD::Find( ZOLTAN_ID_PTR gid, ZOLTAN_ID_PTR lid, char *data, int *part, const int & count, int *owner) const; |
Arguments: | |
dd | Distributed directory structure state information. |
gid | List of GIDs whose information is requested. |
lid | Corresponding list of local IDs (optional) (out). |
data | Corresponding list of user data (optional) (out). |
part | Corresponding list of parts (optional) (out). |
count | Count of GIDs in above list. |
owner | Corresponding list of data owners (out). |
Returned Value: | |
int | Error code. |
ZOLTAN_OK is the normal return.
ZOLTAN_WARN is returned when at
least one GID in the gid list is not found AND debug level > 0.
ZOLTAN_MEMERR is returned whenever memory can not be allocated.
ZOLTAN_FATAL is returned whenever there is a problem with the input arguments (such as
dd being NULL) or communications error.
C: |
int Zoltan_DD_Remove (struct Zoltan_DD_Struct *dd, ZOLTAN_ID_PTR gid, int count); |
C++: | int Zoltan_DD::Remove( ZOLTAN_ID_PTR gid, const int & count); |
Arguments: | |
dd | Distributed directory structure state information. |
gid | List of GIDs to eliminate from the directory. |
count | Number of GIDs to be removed. |
Returned Value: | |
int | Error code. |
C: |
void Zoltan_DD_Set_Hash_Fn (struct Zoltan_DD_Struct *dd, unsigned int (*hash) (ZOLTAN_ID_PTR, int, unsigned int)); |
C++: | void Zoltan_DD::Set_Hash_Fn( unsigned int (*hash) (ZOLTAN_ID_PTR, int, unsigned int)); |
Experienced users may elect to create their own hash function based on
their knowledge of their GID naming scheme. The user's hash
function must have calling arguments compatible with Zoltan_Hash.
The final argument, nprocs, is the number of processors in
the communicator passed to Zoltan_DD_Create.
Consider that a user has defined a hash function, myhash, as
extern int total_num_gid;
unsigned int myhash(ZOLTAN_ID_PTR gid, int length, unsigned int nproc)
{
/* Assuming a processor is more likely to query GIDs that are numerically
close to the GIDs it owns, */
/* this hash function tries to store the gid's directory information
near the gid's owning processor's neighborhood. */
/* GID length is one ; total_num_gid is a global variable with the total number of GIDs in the application. */
return ((*gid * nproc) / total_num_gid);
}
Then the call to register this hash function is:
Zoltan_DD_Set_Hash(dd, myhash);
Arguments: | |
dd | Distributed directory structure state information. |
hash | Name of user's hash function. |
Returned Value: | |
void | NONE |
C: |
void Zoltan_DD_Stats (struct Zoltan_DD_Struct *dd); |
C++: | void Zoltan_DD::Stats() const; |
Arguments: | |
dd | Distributed directory structure for state information |
Returned Value: | |
void | NONE |
Arguments: | |
dd | Distributed directory structure state information. |
size | Number of consecutive GIDs associated with a processor. |
Returned Value: | |
int | Error code. |
Arguments: | |
dd | Distributed directory structure state information. |
proc | List of processor ids labeling for corresponding high, low value. |
low | List of low GID limits corresponding to proc list. |
high | List of high GID limits corresponding to proc list. |
n | Number of elements in the above lists. Should be number of processors! |
Returned Value: | |
int | Error code. |
C: | int Zoltan_DD_Print (struct Zoltan_DD_Struct *dd); |
C++: | int Zoltan_DD::Print () const; |
Arguments: | |
dd | Distributed directory structure state information. |
Returned Value: | |
int | Error code. |
User's Notes
Because Zoltan places no restrictions on the content or length of GIDs, hashing does not guarantee a balanced distribution of objects in the distributed directory. Note also, the worst case behavior of a hash table lookup is very bad (essentially becoming a linear search). Fortunately, the average behavior is very good! The user may specify their own hash function via Zoltan_DD_Set_Hash_Fn to improve performance.
This software module is built on top of the Zoltan Communications functions for efficiency. Improvements to the communications library will automatically benefit the distributed directory.