|
Zoltan:
Data-Management Services for Parallel Applications
Project Description
Overview
The Zoltan library is a collection of data management services for parallel,
unstructured, adaptive, and dynamic applications. It simplifies the
load-balancing, data movement, unstructured communication, and memory usage
difficulties that arise in dynamic applications such as adaptive
finite-element methods, particle methods, and crash simulations.
Zoltan's data-structure neutral design also lets a wide range of applications
use it without imposing restrictions on application data structures.
Its object-based interface provides a simple and inexpensive way for
application developers to use the library and researchers to make new
capabilities available under a common interface.
Zoltan provides tools that help developers of parallel applications.
These tools are provided in an easy-to-use toolkit that is callable from C,
C++, and Fortran90.
Design of Toolkits and Libraries
Using general-purpose libraries allows
algorithms to be shared among and compared within many applications. The
close dependence of libraries on application data, however,
requires careful design to maintain separation between the
libraries and application data structures.
One way to provide this separation is to use object-based software design.
Instead of requiring the application to build data structures
required by the library, the application could pass functions that access
the application data structure to the libraries.
For example, rather than require an application to build a complicated graph
description, the library can require an application to provide a
function returning graph vertices and a function returning edge
connectivity for a given vertex.
Using these functions, the
library can build the data structures it needs.
This object-based design has a number of advantages.
-
Changes in the library's data structures need not
propagate back to the application. As long as the set of required functions
does not change, the application does not need to change to use new versions
of the library.
-
Once the set of required functions is implemented, the application can use all
the algorithms in the library.
-
The required functions are generally easy for
an application to implement, as most applications need to
access their data objects and the interactions between objects
for their own computations.
-
Memory usage is lower as
an application does not have to build an intermediate data structure
that is later converted to appropriate data structures for the library.
-
The constructor for library data structures is called only when it
is needed, and only the data needed for a particular algorithm is obtained.
There are a few disadvantages to this object-based approach as well.
-
Additional overhead is incurred as the library calls the functions to
build its data structures.
In experiments, however, this cost has been very low
relative to the cost of actual computation in the library.
-
A general-purposes tool can
provide only limited support for manipulations of application data
structures (e.g., data movement).
For more detailed information, see
[Hendrickson
and Devine].
Zoltan's Design
We have chosen an object-based, callback function design. An application
provides a number of simple callback functions that access the application
data structures. Zoltan then calls these functions to obtain data it needs.
Geometric algorithms are
supported via callback functions returning objects to be balanced and the
weights and coordinates of those objects.
Graph-based algorithms are
supported by callback functions returning objects to be
balanced, edges between objects, and object and edge weights.
For refinement-tree algorithms, additional callback functions return
parent-child relationships.
Support for data migration (the movement of data to establish a new
decomposition) is also provided through a similar callback
function interface. An application provides callback functions that pack
object data into and unpack data from communication buffers provided by
Zoltan. Zoltan calls the packing function to load communication buffers,
performs the communication necessary to move the data, and calls the unpacking
function to unload the communication buffers.
Zoltan Examples
Several examples of Zoltan's use can be found in the
Zoltan User's Guide.
Typical Approach to Dynamic Load Balancing
Dynamic load balancing has been used in many applications, ranging from
adaptive mesh refinement to particle methods to contact detection algorithms.
In most applications using dynamic load balancing, the load-balancing
algorithm is implemented directly in the application, with close coupling
between the application's and load-balancing algorithm's data structures.
This typical approach has two disadvantages.
- It is possible that the application developer did not select the
best algorithm for the application, but the developer is unable to compare the
algorithm with others without taking time to implement many algorithms in the
application.
- The close coupling of the algorithm's and application's
data structures limits the algorithm's use to a single application.
Developers wanting to use the algorithm in a new application have to re-write
the algorithm using the new application's data structures.
As a result, research into and use of dynamic load-balancing algorithms are
severely impaired.
Why Dynamic Load Balancing is Harder than Static Partitioning
Many high-quality static partitioning tools exist; examples include
Chaco,
METIS,
and
SCOTCH.
General-purpose dynamic load-balancing tools are less common, however,
since they are more difficult to implement. The difficulty arises from
fundamental algorithmic and software-engineering
differences between static and dynamic partitioning. These differences are
summarized in the following table.
Static Partitioning... |
Dynamic Load Balancing... |
Generally used as a pre-processor to an application. |
Runs side-by-side with an application. |
Can be (and usually is) implemented serially. |
Must be implemented in parallel. |
Has only modest concern for execution time. |
Must run quickly (time to load balance should not exceed time to run in
an unbalanced state). |
Has only modest concern for memory usage. |
Must use little memory (should not affect scalability of
application). |
Can use file-based interfaces (read geometry from a file; write partition
info to a file). |
Must use function-call interfaces. |
Has no dependence on an application's data structures. |
Needs information stored in an application's data structures. |
Accounts for part sizes and communication costs. |
Accounts for part sizes, communication costs, and data movement
costs. |
Zoltan's Load-Balancing Suite
In our experience, no single partitioning strategy is effective for all
parallel computations. Some application require partitions based only on the
problem's workloads and geometry; others benefit from explicit consideration
of dependencies between data. Some applications require the highest quality
partitions possible, regardless of the cost to generate them; others can
sacrifice some quality so long as new partitions can be generated quickly.
For some applications, the cost to relocate data is prohibitively high, so
incremental partitioning algorithms are needed; other applications can
tolerate greater remapping costs. Most important, application developers
might not know in advance which strategies work best in their applications, so
the need a convenient means of comparing algorithms.
We provide two classes of parallel partitioning algorithms in the Zoltan
library:
Once the Zoltan callback functions are implemented, an application can switch
between partitioning algorithms by changing only the
LB_METHOD parameter
through a call to
Zoltan_Set_Param.
Thus, comparing different algorithms within a single application is easy,
enabling users to try several algorithms and find
the best ones for their applications.
Data Migration Tools
A complicated part of dynamic repartitioning is the need to move data from old
processors to new ones. This data migration requires deletions and insertions
from the application data structures as well as communication between the
processors.
To help an application with
data migration, Zoltan requires an application to
supply callback functions that pack data into communication buffers and unpack
data from communication buffers. Zoltan
calls the packing function to load communication buffers with objects to be
exported, performs all communication needed to move the data, and calls the
unpacking function to load the objects in the data structures on the new
processors. This mechanism eliminates the need for the application developer
to implement complicated communication for data migration.
Unstructured Communication Library
Unlike static applications, where communication patterns remain fixed
throughout the computation, dynamic applications can have complicated,
changing communication patterns. For example:
- After adaptive mesh refinement,
new communication patterns must reflect dependencies between newly created
elements.
-
Multiphysics simulations, such as crash simulations, might require complicated
communication to transfer data between decompositions for different simulation
phases.
Zoltan provides an
unstructured communication package
to simplify
communication. The package builds a communication plan, including information
about both message sends and receives for a given processor. The plan can be
reused throughout the application, or destroyed and rebuilt when communication
patterns change. The package also includes simple communication primitives that
insulate the user from details of message sends and receives.
Distributed Data Directories
Dynamic applications often need to locate off-processor information. After
repartitioning, for example, a processor might need to rebuild ghost cells and
lists of data to be communicated. It might know which data it needs, but not
where the data are located.
To help locate off-processor data, Zoltan includes a
distributed data
directory tool that is scalable with respect to both memory usage and
computation time. Processors register their owned objects with the directory.
Then, through a rendezvous algorithm, other processors can look up the
location of data they need.
Memory Management Tools
Dynamic applications rely heavily on the ability to allocate and free memory
as needed. Memory leaks and invalid memory accesses are common to developing
software. Although many software development tools let users track memory
bugs, these tools are often not available on state-of-the-art parallel
computers.
Zoltan's memory management package
provides simple in-application debugging
tools that are beneficial on state-of-the-art computing platforms. The
package includes wrappers around malloc and free that record the location of
all memory
allocation operations. Thus, tracking memory leaks is simplified, as
source-code locations of unfreed-memory allocations can be printed.
Statistics about memory allocations and frees are also available.
|