Maskman Manual

This application facilitates manual mask generation for a wide range of activities (e.g., setting affinity).

Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.

Table of Contents

1. Usage

This application provides methods for requesting a list of masks. Each method has its own options and behavior. In general, invoking Maskman will resemble the following.

$ maskman [global options] \
          [input method] [input options] \
          [output method] [output options]

The supported methods and their options are described below.

1.1. Global Options

The following options do not directly map to any input or output method.

-v or --verbose
This increases verbosity. On POSIX systems, the extra information printed should all go to STDERR for easy separation with the desired output.

1.2. Input Method luiscli

This method enables users to specify Lists of Unsigned Integers Simply through a compact syntax. Its usage is below.

$ maskman luiscli [luisstring]

where

luisstring
This string allows users to specify a list of lists; see LUIS Description for more information.

If luiscli is the only input method, then it does not need to be given on the command line, i.e., maskman luiscli [luisstring] is the same as maskman [luisstring].

1.2.1. LUIS Description

This compact notation allows the users to specify a list of lists containing unsigned integers. The following characters have meaning in a LUIS string.

;
This separates the outer lists.
:
This allows the user to specify an inclusive range of values via start:stop:increment. If only a single : is used, the increment is assumed to be 1. For example, 2:5, which is the same as 2:5:1, would expand to 2, 3, 4, and 5. However, 2:5:2 would expand to 2 and 4 since 6 would be beyond 5.

Numbers can also be individually listed and separated with any non-numeric character that also excludes the aforementioned characters; the author recommends , for this.

So, to put it all together, if a user wanted to create 2 masks where the first one has 1 through 4 while the second had 5, 7, 9, 11, and 42, then they could use the following example.

$ maskman luiscli "1:4;5:11:2,42"

1.3. Output Method simplestdout

This output method simply prints things to STDOUT. If simplestdout is the only input method, then it does not need to be specified on the command line.

1.4. HPC Slurm Example

This example will use Maskman to manually set the affinity for each MPI task via the Slurm MPI wrapper. First, install Maskman and the xthi utility. Then, get an interactive allocation. Once that is done, execute xthi without specifying anything to see what it provides; an example is below; some output is tweaked for documentation formatting.

$ export OMP_NUM_THREADS=2
$ srun --ntasks=2 xthi
MPI Rank=0 OMP Thread=0  CPU=167  NUMA Node=3  CPU Affinity=  0-55,112-167
MPI Rank=0 OMP Thread=1  CPU=154  NUMA Node=3  CPU Affinity=  0-55,112-167
MPI Rank=1 OMP Thread=0  CPU=223  NUMA Node=7  CPU Affinity=56-111,168-223
MPI Rank=1 OMP Thread=1  CPU=210  NUMA Node=7  CPU Affinity=56-111,168-223

Now, let’s add Maskman and set the affinity for the first rank and its spawned threads to the CPU range 14-15 (i.e., individual physical cores within NUMA domain 1 of processor 0) and the second atop 70-71 (i.e., individual physical cores within NUMA domain 1 of processor 1).

$ export MASKMAN=`maskman "14:15;70:71"`
$ export OMP_NUM_THREADS=2
$ echo ${MASKMAN}
0xc000,0xc00000000000000000
$ srun --ntasks=2 --cpu-bind=verbose,mask_cpu:${MASKMAN} xthi
cpu-bind=MASK - foo, task  0  0 [92660]: mask 0xc000 set
cpu-bind=MASK - foo, task  1  1 [92661]: mask 0xc00000000000000000 set
MPI Rank=0 OMP Thread=0  CPU=15  NUMA Node=1  CPU Affinity=14,15
MPI Rank=0 OMP Thread=1  CPU=14  NUMA Node=1  CPU Affinity=14,15
MPI Rank=1 OMP Thread=0  CPU=71  NUMA Node=5  CPU Affinity=70,71
MPI Rank=1 OMP Thread=1  CPU=71  NUMA Node=5  CPU Affinity=70,71

Maskman successfully provided the range of CPUs desired for each MPI rank. One issue, though, is that OpenMP thread 1 under MPI rank 1 shared the same CPU with OpenMP thread 0. Something is not behaving optimally. So, I can further limit the mask with some OpenMP runtime environment variables, e.g., OMP_PLACES.

$ export MASKMAN=`maskman "14:15;70:71"`
$ export OMP_NUM_THREADS=2
$ export OMP_PLACES=cores
$ echo ${MASKMAN}
0xc000,0xc00000000000000000
$ srun --ntasks=2 --cpu-bind=verbose,mask_cpu:${MASKMAN} xthi
cpu-bind=MASK - foo, task  0  0 [93626]: mask 0xc000 set
cpu-bind=MASK - foo, task  1  1 [93627]: mask 0xc00000000000000000 set
MPI Rank=0 OMP Thread=0  CPU=14  NUMA Node=1  CPU Affinity=14
MPI Rank=0 OMP Thread=1  CPU=15  NUMA Node=1  CPU Affinity=15
MPI Rank=1 OMP Thread=0  CPU=70  NUMA Node=5  CPU Affinity=70
MPI Rank=1 OMP Thread=1  CPU=71  NUMA Node=5  CPU Affinity=71

So, in this example, Slurm set a mask where each MPI rank has 2 CPUs it can spawn processes within and then OpenMP further reduced the mask to a single CPU. This particular compute node has 2 processors that each have 4 NUMA domains with 14 physical cores, each with 2 threads. Let’s put 2 ranks per NUMA domain with 7 OpenMP threads per rank and ensure they only use the first thread if possible (the secondary threads are CPUs above 112). Let’s also skip CPU 0 and 56 in case there are other processes pinned to them (i.e., 112 in place of 0 and 168 in place of 56).

$ export MASKMAN=`maskman "  112,1:6;  7:13;  14:20;   21:27; \
                               28:34; 35:41;  42:48;   49:55; \
                           168,57:62; 63:69;  70:76;   77:83; \
                               84:90; 91:97; 98:104; 105:111; "`
$ export OMP_NUM_THREADS=7
$ export OMP_PLACES=cores
$ srun --ntasks=16 --cpu-bind=verbose,mask_cpu:${MASKMAN} xthi
...task  0  0 ...mask 0x1000000000000000000000000007e set
...task  1  1 ...mask 0x3f80 set
...task  2  2 ...mask 0x1fc000 set
...task  3  3 ...mask 0xfe00000 set
...task  4  4 ...mask 0x7f0000000 set
...task  5  5 ...mask 0x3f800000000 set
...task  6  6 ...mask 0x1fc0000000000 set
...task  7  7 ...mask 0xfe000000000000 set
...task  8  8 ...mask 0x1000000000000000000000000007e00000000000000 set
...task  9  9 ...mask 0x3f8000000000000000 set
...task 10 10 ...mask 0x1fc00000000000000000 set
...task 11 11 ...mask 0xfe0000000000000000000 set
...task 12 12 ...mask 0x7f000000000000000000000 set
...task 13 13 ...mask 0x3f80000000000000000000000 set
...task 14 14 ...mask 0x1fc000000000000000000000000 set
...task 15 15 ...mask 0xfe00000000000000000000000000 set
MPI Rank= 0 OMP Thread=0  CPU=112  NUMA Node=0  CPU Affinity=112
MPI Rank= 0 OMP Thread=1  CPU=  1  NUMA Node=0  CPU Affinity=  1
MPI Rank= 0 OMP Thread=2  CPU=  2  NUMA Node=0  CPU Affinity=  2
...
MPI Rank= 7 OMP Thread=6  CPU= 55  NUMA Node=3  CPU Affinity= 55
MPI Rank= 8 OMP Thread=0  CPU=168  NUMA Node=4  CPU Affinity=168
MPI Rank= 8 OMP Thread=1  CPU= 57  NUMA Node=4  CPU Affinity= 57
...
MPI Rank=15 OMP Thread=5  CPU=110  NUMA Node=7  CPU Affinity=110
MPI Rank=15 OMP Thread=6  CPU=111  NUMA Node=7  CPU Affinity=111

The closest this configuration could be created with “standard” Slurm commands is shown below. Please note that results may vary with different Slurm configurations.

$ export OMP_NUM_THREADS=7
$ export OMP_PLACES=cores
$ srun --nodes=1 --ntasks=16 --cpus-per-task=7 \
  --distribution=block:block --hint=nomultithread xthi
MPI Rank= 0 OMP Thread=0  CPU=  0  NUMA Node=0  CPU Affinity=  0
MPI Rank= 0 OMP Thread=1  CPU=  1  NUMA Node=0  CPU Affinity=  1
MPI Rank= 0 OMP Thread=2  CPU=  2  NUMA Node=0  CPU Affinity=  2
...
MPI Rank=15 OMP Thread=4  CPU=109  NUMA Node=7  CPU Affinity=109
MPI Rank=15 OMP Thread=5  CPU=110  NUMA Node=7  CPU Affinity=110
MPI Rank=15 OMP Thread=6  CPU=111  NUMA Node=7  CPU Affinity=111

2. Development

This program was written to be easy to build and develop within. These features are listed below.

  1. It was architected to facilitate the easy inclusion of additional, future input and output methods.
  2. It is written in C11. This is extremely portable and should successfully build with all known C compilers atop all known architectures.
  3. It leverages the Kitware CMake build system.
  4. Automation, including CI/CD, is facilitated with the top-level GNU Make Makefile.
  5. Its documentation is written in Org Mode. This facilitated rapid documentation generation visible within Forge previewers in addition to robust exporting to HTML, Unix Manual, and PDF, all through the installation of a single cross-platform utility (i.e., GNU Emacs); PDF exporting would also require a LaTeX installation.
  6. Its source code has formatting requirements enforced by ClangFormat.
  7. Its logging adheres to the logfmt style.

2.1. Building

2.1.1. Application

This application only requires a C11 compiler and a semi-recent version of CMake to build. Standard CMake conventions are followed and can be used. See the Automation section for additional information.

2.1.2. Documentation

This projects supports writing out documentation in quite a few formats. The environment requires GNU Emacs and LaTeX generation capabilities. The CMake build system can be configured to build the documentation by setting -DMaskman_Documentation:BOOL=ON. If only the documentation is desired to be built, then the application build can be turned off via -DMaskman_Application:BOOL=OFF.

2.1.3. Automation

There is a top-level GNU Make Makefile (named Makefile) that has targets to facilitate the easy building of Maskman and its artifacts. Some examples are provided below. These will typically put their artifacts within appropriately named folders within the source code tree that begin with an underscore. The CI/CD automation file(s) (e.g., .gitlab-ci.yml) should contain examples of how to leverage this for every task.

# clean out old build and installation artifacts
$ make clean

# perform a typical build utilizing CC for compiler
$ make app

# install the application and files into an easy to grok folder structure
$ make install

Author: Anthony M. Agelastos, Douglas M. Pase

Created: 2025-04-16 Wed 16:55

Validate