[Sandia National Laboratories]

[navigation panel]

Zoltan Home Page
Zoltan User's Guide
Zoltan Developer's Guide
Frequently Asked Questions
Zoltan Project Description
Papers and Presentations
How to Cite Zoltan
Download Zoltan
Report a Zoltan Bug
Contact Zoltan Developers
Sandia Privacy and Security Notice
Zoltan: 
Parallel Partitioning, Load Balancing and Data-Management Services

Frequently Asked Questions


  1. How do I upgrade from the Zoltan v1 interface (in lbi_const.h) to the current Zoltan interface (in zoltan.h)?
  2. Zoltan's hypergraph partitioner is returning empty parts, that is, parts that have zero objects in them. Is this a bug?
  3. On some platforms, why do Zoltan partitioning methods RCB and RIB use an increasing amount of memory over multiple invocations?
  4. Why does compilation of the Fortran interface hang with Intel's F90 compiler?
  5. During runs (particularly on RedStorm), MPI reports that it is out of resources or too many messages have been posted. What does this mean and what can I do?
  6. On very large problems, Zoltan communication routines fail in MPI_Alltoallv. Why does this happen and what can I do?
  7. Realloc fails when there is plenty of memory. Is this a Zoltan bug?
  8. What does the following message mean during compilation of zoltan: Makefile:28: mem.d: No such file or directory




  1. How do I upgrade from the Zoltan v1 interface (in lbi_const.h) to the current Zoltan interface (in zoltan.h)?

    The Zoltan interface was revised in version 1.3 to include "Zoltan" in function names and defined types. Upgrading to this interface is easy.

    • Include "zoltan.h" instead of "lbi_const.h" in your source files.
    • For most Zoltan functions and constants, prefix "LB_" is replaced by "Zoltan_"; for example, "LB_Set_Param" is now "Zoltan_Set_Param." A few exceptions exist; for example, "LB_Balance" is Zoltan_LB_Balance; "LB_Free_Data" is "Zoltan_LB_Free_Data." See the Release v1.3 backward compatibility notes for a complete list of name changes.
    • Fortran90 applications should define user-defined data in zoltan_user_data.f90 rather than lb_user_const.f90.
    More complete details are in the Release v1.3 backward compatibility notes.
  2. Zoltan's hypergraph partitioner is returning empty parts, that is, parts that have zero objects in them. Is this a bug?

    The hypergraph partitioner creates partitions with up to a specified amount of load imbalance; the default value is 10% imbalance allowed, but the user can tighten the load imbalance. Any partition that satisfies the load imbalance tolerance is a valid partition. As a secondary goal, the hypergraph partitioner attempts to minimize interprocessor communication. Having a part with zero weight almost certainly reduces total communication; the zero-weight part would not need to communicate with any other part.

    So in some cases, Zoltan is generating a valid partition -- one that satisfies the imbalance tolerance -- that happens to have lower total communication if one of the parts is empty. This is a good thing, but one that some applications don't like because they didn't consider having zero weight on a processor.

    To try to avoid this problem, lower the imbalance tolerance so that the partitioner is more likely to give work to all parts. Change the value of Zoltan parameter IMBALANCE_TOL to a smaller value; e.g., 1.03 to allow only 3% imbalance:
    Zoltan_Set_Param(zz, "IMBALANCE_TOL", "1.03");

    As an alternative, you may try one of Zoltan geometric methods, such as RCB, RIB or HSFC, which do not have this property.

    We may in the future add a parameter to disallow zero-weight parts, but at present, we do not have that option.


  3. On some platforms, why do Zoltan partitioning methods RCB and RIB use an increasing amount of memory over multiple invocations?

    Zoltan partitioning methods RCB and RIB use MPI_Comm_dup and MPI_Comm_split to recursively create communicators with subsets of processors. Some implementations of MPI (e.g., the default MPI on Sandia's Thunderbird cluster) do not correctly release memory associated with these communicators during MPI_Comm_free, resulting in growing memory use over multiple invocations of RCB or RIB. An undocumented workaround in Zoltan is to set the TFLOPS_SPECIAL parameter to 1 (e.g., Zoltan_Set_Param(zz,"TFLOPS_SPECIAL","1");), which causes an implementation that doesn't use MPI_Comm_split to be invoked.


  4. Why does compilation of the Fortran interface hang with Intel's F90 compiler?

    There is a bug in some versions of Intel's F90 compiler. We know Zoltan's Fortran interface compiles with Intel's F90 compiler versions 10.1.015 through 11.1.056. We know that it does not compile with versions 11.1.059, 11.1.069 and 11.1.072. We reported the problem to Intel, and we are told that the compiler bug is fixed in version 11.1 update 7, which is scheduled for release in August 2010. See this Intel Forum link for more details.


  5. During runs (particularly on RedStorm), MPI reports that it is out of resources or too many messages have been posted. What does this mean and what can I do?

    Some implementations of MPI (including RedStorm's implementation) limit the number of message receives that can be posted simultaneously. Some communications in Zoltan (including hashing of IDs to processors in the Zoltan Distributed Data Directory) can require messages from large numbers of processors, triggering this error on certain platforms.

    To avoid this problem, Zoltan contains logic to use AllToAll communication instead of point-to-point communication when a large number of receives are needed. The maximum number of simultaneous receives allowed can be set as a compile-time option to Zoltan. In the Autotool build environment, option --enable-mpi-recv-limit=# sets the maximum number of simultaneous receives allowed. The default value is 4.


  6. On very large problems, Zoltan communication routines fail in MPI_Alltoallv. Why does this happen and what can I do?

    For very large problems, the values in the displacement arrays needed by MPI_Alltoallv can exceed INT_MAX (the largest integer that can be stored in 32 bits). The solution to this problem is to make Zoltan avoid using MPI_Alltoallv and, instead, use point-to-point sends and receives. The compile-time option in the Autotool build environment is --enable-mpi-recv-limit=0.


  7. Realloc fails when there is plenty of memory. Is this a Zoltan bug?

    This problem has been noted on different Linux clusters running parallel applications using different MPI libraries and C++ libraries. Realloc fails where a malloc call will succeed. The source of the error has not been identified, but it is not a Zoltan bug. The solution is to compile Zoltan with the flag -DREALLOC_BUG. Zoltan will replace every realloc call with malloc followed by a memcpy and a free.


  8. What does the following message mean during compilation of Zoltan?
    Makefile:28: mem.d: No such file or directory

    In the old "manual" build system for Zoltan, dependency files were generated for each source file filename.c. The first time Zoltan was built for a given platform, the dependency files do not exist. After producing this warning, gmake created the dependency files it needed and continued compilation.

    Newer versions of Zoltan use autotools or cmake for builds and, thus, do not produce this warning.


Updated: August 2, 2010
Copyright (c) 2000-2012, Sandia National Laboratories.