ALI Performance Tests on Weaver

Introduction

Currently testing the Greenland Ice Sheet (GIS) in Albany Land Ice (ALI) using Nvidia Tesla V100 GPUs on weaver. Note: Waterman was moved to a different network and renamed weaver. The last waterman test was executed on 6/16/2020.

Architectures:

Name Weaver (P9/V100)
CPU Dual-socket IBM POWER9
GPU Nvidia Tesla V100
Cores/Node 40
Threads/Core 8
GPUs/Node 4
Memory/Node 319 GB
Interconnect Mellanox EDR IB (100 GB/s)
Compiler gcc 7.2.0
GPU Compiler cuda 10.1.105
MPI openmpi 4.0.1

Cases:

Case Name Number of Processes (np) Description
green-1-10km_ent_fea_mem_tet 8 Unstructured 1-10km GIS, Enthalpy problem, finite element assembly only, memoization, tetrahedron
green-1-7km_fea_1ws 8 Unstructured 1-7km GIS, finite element assembly only, single workset
green-1-7km_fea_mem 8 Unstructured 1-7km GIS, finite element assembly only, memoization
green-1-7km_muk_ls_mem 8 Unstructured 1-7km GIS, MueLu w/ kokkos and line smoothing, memoization
green-3-20km_vel_fea_mem_tet 8 Unstructured 3-20km GIS, Velocity problem, finite element assembly only, memoization, tetrahedron
green-3-20km_vel_fea_mem_wdg 8 Unstructured 3-20km GIS, Velocity problem, finite element assembly only, memoization, wedge
green-3-20km_ent_fea_mem_tet 8 Unstructured 3-20km GIS, Enthalpy problem, finite element assembly only, memoization, tetrahedron
green-3-20km_ent_fea_mem_wdg 8 Unstructured 3-20km GIS, Enthalpy problem, finite element assembly only, memoization, wedge

Timers:

Timer Name Level Description
Albany Total Time 0 Total wall-clock time of simulation
Albany: Setup Time 1 Preprocess
Albany: Total Fill Time 1 Finite element assembly
Albany Fill: Residual 2 Residual assembly
Albany Residual Fill: Evaluate 3 Compute the residual, local/global assembly
Albany Residual Fill: Export 3 Update global residual across MPI ranks
Albany Fill: Jacobian 2 Jacobian assembly
Albany Jacobian Fill: Evaluate 3 Compute the Jacobian, local/global assembly
Albany Jacobian Fill: Export 3 Update global Jacobian across MPI ranks
NOX Total Preconditioner Construction 1 Construct Preconditioner
NOX Total Linear Solve 1 Linear Solve

Specifications

Performance Timelines

Plot of wall-clock times or memory for nightly runs

Changepoints are estimated using a generalized likelihood ratio method on each timer, and then merged over all timers for a given test case.

Plot window controls

Pollak, Moshe; Siegmund, D. Sequential Detection of a Change in a Normal Mean when the Initial Value is Unknown. Ann. Statist. 19 (1991), no. 1, 394--416. doi:10.1214/aos/1176347990. https://projecteuclid.org/euclid.aos/1176347990

Siegmund, D.; Venkatraman, E. S. Using the Generalized Likelihood Ratio Statistic for Sequential Detection of a Change-Point. Ann. Statist. 23 (1995), no. 1, 255--271. doi:10.1214/aos/1176324466. https://projecteuclid.org/euclid.aos/1176324466

Hawkins, D. M., & Zamba, K. D. (2005). Statistical Process Control for Shifts in Mean or Variance using a Change Point Formulation. Technometrics, 47, 164-173.

Hawkins DM, Qiu P, Kang CW. The changepoint model for statistical process control. Journal of Quality Technology. 2003 Oct 1;35(4):355-366.