The unit cell template meshes are placed by zellij into the specified locations in the lattice and the nodes on the boundaries of the unit cell meshes are united or coincident. Each unit cell mesh must have the same exterior boundary meshes and coordinate extents on the X and Y coordinate faces, but the Z faces are only required to have the same coordinate extent; the Z face meshes are not required to be the same among the different unit cells.

The lattice can be represented as a IxJ regular grid with each "cell" in the grid or lattice containing one of the unit cell template meshes.

Execution

Executing zellij with the -help option will result in output similar to the following:

Zellij
        (A code for tiling 1 or more template databases into a single output database.)
        (Version: 1.4.1) Modified: 2021/03/16
        Parallel Capability Not Enabled.
 
usage: zellij [options] -lattice <lattice_definition_file>
        -lattice <$val> (Name of file to read lattice definition from. [required])
        -output <$val> (Name of output file to create. Default is `zellij-out.e`)
 
        -rcb (Use recursive coordinate bisection method to decompose the input mesh in a parallel run.)
        -rib (Use recursive inertial bisection method to decompose the input mesh in a parallel run.)
        -hsfc (Use hilbert space-filling curve method to decompose the input mesh in a parallel run. [default])
        -linear (Use the linear method to decompose the input mesh in a parallel run.
                Elements in order first n/p to proc 0, next to proc 1.)
        -cyclic (Use the cyclic method to decompose the input mesh in a parallel run.
                Elements handed out to id % proc_count)
        -random (Use the random method to decompose the input mesh in a parallel run.
                Elements assigned randomly to processors in a way that preserves balance
                (do _not_ use for a real run))
 
        -ranks <$val> (Number of ranks to decompose mesh across)
        -start_rank <$val> (In partial output mode, start outputting decomposed files at this rank)
        -rank_count <$val> (In partial output or subcycle modes, output this number of ranks)
        -subcycle (Process cells in groups of '-rank_count'.  Helps minimize open files,
                but is faster than only having a single file open.)
        -scale <$val> (Scale the output mesh coordinates by the specified value)
        -minimize_open_files [$val] (Close files after accessing them to avoid issues with too many open files.
                If argument is 'output' then close output, if 'unit' then close unit cells;
                if 'all' or no argument close all.
                Should not need to use this option unless you get an error message indicating this issue.)
 
        -ignore_sidesets (Do not copy any sidesets in the unit cells to the output file.)
        -generate_sidesets <$val> (Which surfaces on the output mesh should have sidesets generated,
                 Valid options are:
                 'x' or 'i' for surface on minimum X coordinate, default name = `min_i`
                 'y' or 'j' for surface on minimum Y coordinate, default name = `min_j`
                 'z' or 'k' for surface on minimum Z coordinate, default name = `min_k`
                 'X' or 'I' for surface on maximum X coordinate, default name = `max_i`
                 'Y' or 'J' for surface on maximum Y coordinate, default name = `max_j`
                 'Z' or 'K' for surface on maximum Z coordinate, default name = `max_k`
                 For example `xyXY` would generate sidesets on min/max X and Y surfaces.)
        -sideset_names <$val> (Specify names for one or more of the generated sidesets.
                 Form is `axis:name,axis:name,...`
                 where 'axis' is one of 'ijkIJKxyzXYZ', and 'name' is the name of the sideset.
                 The default names are 'min_i', 'max_i', 'min_j', 'max_j', 'min_k', 'max_k'.
                 For example `x:left,X:right` would name the sideset on the min x face 'left' and the max X face 'right'.)
 
        -netcdf3 (Output database will be a netcdf3 native classical netcdf file format (32-bit only))
        -netcdf4 (Output database will be a netcdf4 hdf5-based file instead of the classical netcdf file format (default))
        -netcdf5 (Output database will be a netcdf5 (CDF5) file instead of the classical netcdf file format)
 
        -32-bit (True if forcing the use of 32-bit integers for the output file)
        -64-bit (True if forcing the use of 64-bit integers for the output file (default))
 
        -zlib (Use the Zlib / libz compression method if compression is enabled (default) [exodus only].)
        -szip (Use SZip compression. [exodus only, enables netcdf-4])
        -compress <$val> (Specify the hdf5 zlib compression level [0..9] or szip [even, 4..32] to be used on the output file.)
 
        -separate_cells (Do not equivalence the nodes between adjacent unit cells.)
        -repeat <$val> (Each lattice entry will be used the specified number of times as will
                each row in the lattice (for debugging). `-repeat 2` would double the lattice.)
        -skip <$val> (Skip the specified number of lattice entries and rows. For example, -skip 1
                would read every other entry on the row and every other row. (for debugging))
        -help (Print this summary and exit)
        -version (Print version and exit)
        -debug <$val> (debug level (values are or'd)
                   1 = Time stamp information.
                   2 = Memory information.
                   4 = Verbose Unit Cell information.
                   8 = Verbose output of Grid finalization calculations.
                  16 = Put exodus library into verbose mode.
                  32 = Verbose decomposition information.
                  64 = Verbose output database summary information.
                 128 = Verbose sideset generation information.)
        -copyright (Show copyright and license data.)
 
        Can also set options via ZELLIJ_OPTIONS environment variable.
 
        ->->-> Send email to gdsjaar@sandia.gov for zellij support.<-<-<-

The only required option is -lattice followed by the name of the file containing the lattice description. The other options are used to specify compression of the output file; the format of the output file; or to request additional debug output.

If the -output <filename> option is not specified, then the output mesh will be named zellij-out.e.

Lattice Description File Format

The format of the lattice description file is fairly simple, but is also very rigid. There are two sections of the file – the unit cell dictionary and the lattice definition.

Unit Cell Dictionary

The unit cell dictionary defines the unit cell template meshes that will be placed in the lattice. The dictionary begins with a line containing BEGIN_DICTIONARY followed by one or more lines defining the unit cells and is then ended with a line containing END_DICTIONARY

The syntax of the lines defining the unit cells consists of two fields – an arbitrary key and the filename containing the Exodus file defining the mesh for this unit cell. The only restriction on the key is that it must be unique in the dictionary. The filenames must specify the path (either absolute or relative to the current execution directory) to the Exodus file; it can optionally be delimited by double quotes. The filenames do not need to be unique, but it is more efficient in both memory and time if each unit cell template mesh is unique.

As an example, here is a valid dictionary definition:

BEGIN_DICTIONARY
  0001 "../zellij-example/xatom-1b.e"
  0002 "../zellij-example/xatom-Y.e"
  0003 "../zellij-example/xatom-X.e"
  0004 "../zellij-example/xatom-2b.e"
END_DICTIONARY

The unit cell dictionary must appear before the lattice definition in the lattice description file.

If an error is detected during the parsing of the unit cell dictionary, the code will output an error message and terminate. Errors can be incorrect syntax, missing unit cell template meshes, duplicate keys, or problems reading the mesh description from a unit cell template mesh. The unit cell template mesh file is accessed and partially read at the time that zellij parses the corresponding unit cell dictionary line.

Lattice Definition

The lattice definition specifies the size of the lattice and the distribution of the unit cell(s) within that lattice. The lattice definition must follow the unit cell dictionary in the lattice description file.

The first line of the lattice definition begins with the line BEGIN_LATTICE {i} {j} 1 where {i} and {j} specify the size of the IxJ arrangement of unit cells. For example, the line BEGIN_LATTICE 5 5 1 would define a lattice containing 25 unit cell instances arranged in a 5 by 5 regular grid.

The last line of the lattice definition is the line END_LATTICE. When that line is encountered, zellij will begin outputting the mesh.

Between the BEGIN_LATTICE and END_LATTICE are {j} lines with {i} entries per line. The entries are any of the _key_s that were specified in the unit cell dictionary.

As an example, here is a valid lattice definition using the keys of the example dictionary from the previous section:

BEGIN_LATTICE 5  5  1
0002 0003 0002 0001
0003 0003 0003 0002
0003 0004 0003 0003
0003 0003 0003 0002
0002 0003 0002 0001
END_LATTICE

Although the lattice is typically symmetric and square, this is not a requirement and is not checked.

If an error is detected during the parsing of the lattice, the code will output an error message and terminate. Errors can include invalid keys, incorrect number of lattice definition lines, or incorrect number of keys on a definition line.

Note that zellij does not require that the unit cell keys be numeric; the following example shows a different method for specifying the same lattice definition file as the previous example:

BEGIN_DICTIONARY
  - "../zellij-example/xatom-1b.e"
  | "../zellij-example/xatom-Y.e"
  + "../zellij-example/xatom-X.e"
  * "../zellij-example/xatom-2b.e"
END_DICTIONARY
 
BEGIN_LATTICE 5  5  1
  - | + | -
  | + + + |
  + + * + +
  | + + + |
  - | + | -
END_LATTICE

Unit Cell Template Mesh Requirements

Zellij requires that the boundary mesh (X and Y faces) of each of the unit cell templates be a regular "structured" mesh. Basically this means that the faces of the mesh elements on the boundary are in a regular rectangular grid such that each mesh face is rectangular (90 degree corners) and that the boundary mesh on the minimum X face is the same as that on the maximum X face and similarly for the minimum Y face and the maximum Y face.

Additionally, the X faces on all unit cells must match and the Y faces on all unit cells must match both in structure and in coordinate extent. This requirement is verified during execution. The Z faces are less constrained with the only requirement being that the coordinate extents of all Z faces must be the same (which follows from the X and Y face requirement); the structure of the mesh on the Z faces is arbitrary.

The unit cell meshes can contain any number of element blocks; however, each element block must contain hexahedral elements with 8-nodes per element. The element blocks do not need to be the same in each unit cell mesh, but if they do share the same element block id, then those elements will be combined into the same element block in the output mesh with the same id.

The output mesh will contain the union of all element blocks existing on the input mesh unit cells. For example, if:

unit cell 0001 has element blocks 1 10 100
unit cell 0002 has element blocks 2 20 200
unit cell 0003 has element blocks 1 2 10 20
unit cell 0004 has element blocks 10 20 100 200

The output mesh will have element blocks 1 2 10 20 100 200

Sideset Handling

By default, zellij will replicate any sidesets that are defined on the input unit cell meshes to the output mesh file. The sidesets will have the same names and ids as the sidesets on the input unit cell meshes. If you do not want the sidesets replicated, you can add the command line option -ignore_sidesets and any sidesets on the input unit cell meshes will be ignored.

Zellij can also generate new sidesets on the boundaries of the output mesh via the command line option -generate_sidesets <axes> where axes is one or more letters specifying the face of the output mesh on which to generate a sideset. Valid letters are xyzXYZ or ijkIJK which correspond to:

x or i for surface on minimum X coordinate (default name = min_i)
y or j for surface on minimum Y coordinate (default name = min_j)
z or k for surface on minimum Z coordinate (default name = min_k)
X or I for surface on maximum X coordinate (default name = max_i)
Y or J for surface on maximum Y coordinate (default name = max_j)
Z or K for surface on maximum Z coordinate (default name = max_k)

For example -generate_sidesets xyXY would generate sideset on the surfaces corresponding to the minimum and maximum X and Y coordinates on the output mesh.

By default, the generated sidesets will be named as shown in the list above. The names can be changed with the -sideset_names <arg> command line option. The syntax of <arg> is axis:name,axis:name,... where axis is one of ijkIJK or xyzXYZ and name is the name of the specified sideset. For example, -sideset_names x:left,X:right would name the sidesets on the minimum x and maximum X faces left and right respectively. There will be an error if two or more sidesets have the same name.

Parallel Execution

Zellij can produce a mesh decomposed into a file-per-rank for use in a parallel analysis application. Note that Zellij itself is run serially. The main option that tells Zellij to produce the decomposed files is -ranks <number_of_ranks>. If this is specified, then Zellij will create number_of_ranks individual files each containing a portion of the complete model. The files will have the information needed by a parallel application to read the data and set up the correct communication paths and identify the nodes that are shared across processor boundaries.

The decomposition method can also be specified. This determines the algorithm that is used to break the lattice into number_of_ranks pieces each with approximately the same computational complexity. The decomposition methods are:

-rcb Use recursive coordinate bisection method to decompose the input mesh in a parallel run.
-rib Use recursive inertial bisection method to decompose the input mesh in a parallel run.
-hsfc Use hilbert space-filling curve method to decompose the input mesh in a parallel run.
-linear Use the linear method to decompose the input mesh in a parallel run. Elements in order first n/p to proc 0, next to proc 1.
-cyclic Use the cyclic method to decompose the input mesh in a parallel run. Elements handed out to id % proc_count.
-random Use the random method to decompose the input mesh in a parallel run. Elements are assigned randomly to processors in a way that preserves balance (do not use for a real run))

The -hsfc method is the default if no other decomposition method is specified. Note that the decomposition occurs at the grid level so the elements of each grid cell will not be split across multiple ranks. The grid cells are weighted by the number of elements in the cell which should produce a balanced decomposition if there are unit cells of varying element counts.

The -linear, -cyclic, and -random methods are typically used for debugging and testing Zellij and should not be used in a production run, especially the -random method.

Partial Parallel Output Mode

There is a partial parallel output mode in which you can tell Zellij to only output a portion of the parallel decomposed files. This is selected with the -start_rank <rank> and -rank_count <count> options. In this case, Zellij will only output the ranks from rank up to rank+count-1. For example, if you run zellij -ranks 10 -start_rank 5 -rank_count 3, then zellij would output files for ranks 5, 6, and 7. This is somewhat inefficient since zellij will do many of the calculations for all ranks and only output the specified ranks; however, it does allow you to run multiple copies of zellij simultaneously. For example, you could run:

zellij -ranks 16 --start_rank  0 --rank_count 4
zellij -ranks 16 --start_rank  4 --rank_count 4
zellij -ranks 16 --start_rank  8 --rank_count 4
zellij -ranks 16 --start_rank 12 --rank_count 4

simultaneously and all 16 files should be output faster than running a single execution that wrote all of the files.

Parallel Capable Parallel Execution

If Zellij is compiled with parallel capability enabled (This is shown at the beginning of the -help output or the version information output when zellij begins executing as Parallel Capability Enabled), then you can run Zellij in parallel using the normal mpiexec -np <#> zellij <normal zellij options> command. In this case, there will be # copies of zellij running simultaneously and each copy will divide up the output files and work among each process/copy.

For example, if you run mpiexec -np 8 zellij -ranks 1024 -latice lattice.txt, then there will be 8 copies of zellij running and each will output 1024/8 = 128 output files.

Maximum Open File Complications

Most compute systems have a limit on the number of files that a program can have open simultaneously. For many systems, this limit is 1024. The files that zellij deals with are (1) the unit cell meshes and (2) the per-rank output files, and (3) the standard input, output, and error files. Because of this, it is somewhat easy for a zellij execution to exceed the open file limit. Zellij attempts to handle this automatically using logic similar to:

If the unit cell count exceeds the open file limit, then close each unit cell after each access before opening the next unit cell mesh.
If the number of -ranks that zellij is creating exceeds the open file count, then determine how many output files can be open at one time (max_open = open file limit - 3 - number of unit cells open simultaneously) and run zellij in a subcycle mode where it is only writing to max_open files at one time.
If the max_open calculated in the above bullet is too small, then set the mode to only open a single unit cell mesh at a time and redo the calculation.
If all else fails, run with only a single unit cell file open and only a single output mesh rank file open.

If the above logic fails and Zellij is unable to run without exceeding the open file count, you can specify the behavior manually using a combination of the -minimize_open_files=<UNIT|OUTPUT|ALL> option and the -subcycle and -rank_count <#> options.

The options to -minimize_open_files are:

UNIT - only have a single unit cell mesh open at one time; close before accessing another unit cell mesh.
OUTPUT - only have a single output rank mesh file open at one time.
ALL - both of the above options.

The -subcycle and -rank_count <#> options cause zellij to output # output files at a time and then cycle to the next # output files until all files have been output. For example, zellij -ranks 1024 -subcycle -rank_count 256 would do the following:

First cycle would output ranks 0 to 255,
Second cycle would output ranks 256 to 511,
Third cycle would output ranks 512 to 767,
Fourth cycle would output ranks 768 to 1023.

In this mode, there will the # output files open simultaneously (unless -minimize_open_files=OUTPUT|ALL was specified also). So the total number of open files will be unit cell count + 3 + # or 1 + 3 + # if -minimize_open_files=UNIT was specified.

Execution Complexity

Zellij is intended to produce extremely large meshes and is therefore very concerned with both memory efficiency and execution time efficiency.

Memory Complexity

Zellij stores the following data:

For each unit cell template mesh:
- metadata
- 64-bit Ids of nodes on each min_I, max_I, min_J, max_J face
For each entry in the lattice definition:
- metadata (approximately 1KiByte)
- temporarily it will hold 64-bit Ids of nodes on the max_I and max_J faces. This will be deleted once the upper I and upper J "neighbor" entry has been processed (see below)
For the lattice:
- vector containing the lattice definition.

The main memory use once the output file is being processed is the temporary storage containing the nodes on the max_I and max_J faces. The lattice is processed cell by cell. For an II by JJ sized grid, the cells are processed in the order (1,1), (2,1), ... , (II, 1), (1,2), (2,2), ..., (II, JJ). The temporary storage on the max_I face is only needed until the next cell is processed. That is, for cell (i,j), its max_I nodes will be used during the processing of cell (i+1, j) and then deleted.

The temporary storage on the max_J face is retained for a longer time. For cell (i,j), the max_J storage is needed for cell (i, j+1) and then deleted.

For a grid of size (II, JJ), there will at most be:

1 temporary vector of size max_I nodes
II temporary vectors of size max_J nodes.

If you have a lattice that is rectangular (II != JJ), then it is more efficient for memory usage to make the I direction the smallest value if possible.

In addition to the above memory usage, zellij must also transfer the mesh coordinate data and element block connectivity data for each lattice entry to the output file. Zellij outputs the model using the following pseudo-code:

for each j : J
  for each i : I
     read cell(i,j)  x, y, and z local coordinates
     map coordinates to offset in output mesh
     eliminate nodes that join to an already output neighbor cell
     write cell(i,j) x, y, and z global coordinates
 
for each j : J
   for each i : I
      for each element block in cell(i,j) mesh
         read block connectivity
         map local node ids to global node ids
         write block connectivity

The maximum memory use will be the size of storage needed for the x y and z coordinates of a unit cell mesh or the storage needed to hold the connectivity for a single unit cell element block.

Note that the memory requirements are proportional to the size of an individual unit cell mesh and not a function of the size of the output mesh. It is possible to create meshes which are much larger than the amount of memory present on the compute system running zellij.

The memory being used by zellij during execution will be output if the --debug 2 argument is specified at execution time.

Execution Time Complexity

For a large model, the majority of the execution time is related to:

Read/process/write element block connectivity
Read/process/write nodal coordinates
Categorize boundary nodes on each unit cell mesh

Efficiency at the NetCDF level

The Exodus format which is used for the unit cell template meshes and the output mesh uses the NetCDF library for on-disk storage. There are several variants of the NetCDF on-disk storage including the format: netcdf3, netcdf4, and netcdf5 and the integer size (32-bit integers or 64-bit integers). Although these details are usually transparent to the user, they can affect the execution time especially when very large meshes are being processed.

Format

The netcdf3 format is the original native NetCDF format. At the time the library was being developed, the byte endianness of data stored on disk was not standard among the computes in use at that time and the NetCDF developers had to pick an endianness for the data. They picked the XDR standard which stood for eXternal Data Representation which was used for communicating between different computer systems. Regretfully, the representation used by XDR turned out to be opposite of the representation used by (almost?) all systems in use today, so each read and write of data in the netcdf3 format results in a translation of the endianness. This translation is very fast, but is overhead that would not be needed if the on-disk format was the opposite representation. This representation is also used by the netcdf5 format.

However, the NetCDF netcdf4 format is based on using the HDF5 library to manage the underlying data format on disk and it can read and write data using the native endianness of the system on which the data is being read and written and therefore does not incur the cost of transforming the data's endianness.

Integer Size

By default, most current mesh generators will output a mesh using 32-bit integer data. This is sufficient to represent a mesh with up to approximately 2.1 billion nodes and elements.

If the input mesh and the output mesh have the same integer size, then there is no data conversion needed. The data will be read as N-bit integers, processed as N-bit integers, and written as N-bit integers. However, if the input mesh is N-bit integers and the output mesh is M-bit integers, then the NetCDF library will convert all integer data (element block connectivity typically) from N bits to M bits which for large meshes can incur an execution time overhead.

Compression

The NetCDF library supports compression of the output file. Typically, the zlib compression algorithm is used, but recently NetCDF begain supporting the szip compression and a few more algorithms are starting to be supported.

The benefit of the compression is that it can result in much smaller output (and input) mesh files; the disadvantage is that the default zlib compression algorithm is not very fast and can increase the execution time of zellij. The szip compression algorithm is faster with typically (but not always) slightly less compression, but it still will incur an overhead in execution time.

Recommendations

For minimal overhead, it is recommended that:

Use the netcdf4 format for all input and output meshes
Use the same integer size for all input and output meshes
- The integer size of the output mesh can be specified using the -32 or -64 options.
- The -64 option is the default.

It is most efficient if the format and integer size of the input mesh matches the output mesh. The format of the input meshes can be converted using the io_shell application with the -netcdf4 and -64 or -32 options. The format and integer size of a mesh can be queried using the exo_format application.

For illustration, here is the execution time for several runs with different format and integer size. In all cases, the input and output mesh sizes are the same:

input	output	integer input	integer output	execution time
netcdf3	netcdf3	32	32	7.0
netcdf3	netcdf4	32	32	2.6
netcdf3	netcdf4	32	64	3.8
netcdf4	netcdf3	32	32	6.5
netcdf4	netcdf3	64	32	7.4
netcdf4	netcdf5	64	64	9.4
netcdf4	netcdf4	32	32	2.4
netcdf4	netcdf4	32	64	3.6
netcdf4	netcdf4	64	32	3.2
netcdf4	netcdf4	64	64	3.3

The fastest option is both input and output using 32-bit integers and the netcdf4 format. Almost as fast is the case where the input format is netcdf3 and the output netcdf4. The 64-bit integer options with both input and output using netcdf4 are slightly slower, but this is probably due to the doubling of the size of the integer data being read and written.

The output mesh in this case consisted of 37.3 million elements and 38.5 million nodes in a grid of 46 x 46 unit cells. There were 56 unit cell template meshes.

Table of Contents