Fenix @develop
 
Loading...
Searching...
No Matches
Message Logging

Functions for logging and replaying MPI messages after faults. More...

Macros

#define FENIX_MLOG_NONE   -1
 
#define FENIX_MLOG_CONTINUE   -1
 

Functions

int Fenix_Mlog_create (int mlog_id, MPI_Comm *comm, int depth)
 Create a new message logger.
 
int Fenix_Mlog_activate (int mlog_id)
 Active a given mlog, deactivating any previously active mlog.
 
int Fenix_Mlog_active (int *mlog_id)
 Get the currently active message log.
 
int Fenix_Mlog_begin_region (int mlog_id, int region_id)
 Set the region of the given message logger.
 
int Fenix_Mlog_activate_region (int mlog_id, int region_id)
 Activate the mlog and begin the region.
 
int Fenix_Mlog_sync (int mlog_id, int region_id)
 Synchronize messages across ranks each starting at their given region.
 
int Fenix_Mlog_stage (int mlog_id, int group_id, int member_id)
 Stage an mlog's data into a Fenix data member.
 
int Fenix_Mlog_lrestore (int mlog_id, int group_id, int member_id, int time_stamp)
 Restore an mlog from a Fenix data member's local snapshot.
 
int Fenix_Mlog_delete (int mlog_id)
 Delete an mlog.
 

Overview

Functions for logging and replaying MPI messages after faults.

Function Documentation

◆ Fenix_Mlog_activate()

int Fenix_Mlog_activate ( int mlog_id)
local

Active a given mlog, deactivating any previously active mlog.

Only the active mlog can log messages. If any errors occur, no mlog will be active (even if one was active before).

Fenix functions will not be logged, regardless of any active mlog. However, some functions may support inline recovery if it is active. These functions will specify for themselves if they support inline recovery. There are three conditions for inline recovery to be active:

  1. FENIX_MLOG_RECOVERY_MODE is not FENIX_MLOG_RECOVERY_MANUAL
  2. FENIX_RECOVERY_MODE is not FENIX_RECOVERY_IGNORE
  3. An mlog is active.
Parameters
[in]mlog_idThe log to activate. May be FENIX_MLOG_NONE.
Returns
FENIX_SUCCESS if successful, any return code otherwise.

◆ Fenix_Mlog_activate_region()

int Fenix_Mlog_activate_region ( int mlog_id,
int region_id )
local

Activate the mlog and begin the region.

This helper function is equivalent to: Fenix_Mlog_activate(mlog_id); Fenix_Mlog_begin_region(mlog_id, region_id);

Parameters
[in]mlog_idThe logger to activate and set the region of
[in]region_idThe region ID to set, with the same semantics as Fenix_Mlog_begin_region
Returns
FENIX_SUCCESS if successful, any return code otherwise.

◆ Fenix_Mlog_active()

int Fenix_Mlog_active ( int * mlog_id)
local

Get the currently active message log.

Parameters
[out]mlog_idThe active log, may be FENIX_MLOG_NONE
Returns
FENIX_SUCCESS if successful, any return code otherwise.

◆ Fenix_Mlog_begin_region()

int Fenix_Mlog_begin_region ( int mlog_id,
int region_id )
local

Set the region of the given message logger.

Parameters
[in]mlog_idThe logger to set the region of
[in]region_idThe region ID to set Must be positive and greater than current region_id (may equal current region_id if no messages have been logged in the region)
Returns
FENIX_SUCCESS if successful, any return code otherwise.

◆ Fenix_Mlog_create()

int Fenix_Mlog_create ( int mlog_id,
MPI_Comm * comm,
int depth )
local

Create a new message logger.

Parameters
[in]mlog_idA unique identifier (>= 0) for this message logger
[in]commThe MPI_Comm to log messages from. Must be repaired after failures.
[in]depthNumber of regions to keep in logs at once. Older regions will be deleted automatically.
Returns
FENIX_SUCCESS if successful, any return code otherwise.

◆ Fenix_Mlog_delete()

int Fenix_Mlog_delete ( int mlog_id)
local

Delete an mlog.

Parameters
[in]mlog_idThe mlog to delete
Returns
FENIX_SUCCESS if successful, any return code otherwise.

◆ Fenix_Mlog_lrestore()

int Fenix_Mlog_lrestore ( int mlog_id,
int group_id,
int member_id,
int time_stamp )
local

Restore an mlog from a Fenix data member's local snapshot.

Parameters
[in]mlog_idThe mlog to restore.
[in]group_idThe group to restore from.
[in]member_idThe member to restore from.
[in]time_stampThe time stamp of the snapshot to restore from.
Returns
FENIX_SUCCESS if successful, any return code otherwise.

◆ Fenix_Mlog_stage()

int Fenix_Mlog_stage ( int mlog_id,
int group_id,
int member_id )
local

Stage an mlog's data into a Fenix data member.

Parameters
[in]mlog_idThe mlog to stage.
[in]group_idThe group to stage into.
[in]member_idThe member to stage into. The member does not have to already exist. If the member already exists, it must have been created with size FENIX_RESIZEABLE and datatype MPI_BYTE
Returns
FENIX_SUCCESS if successful, any return code otherwise.

◆ Fenix_Mlog_sync()

int Fenix_Mlog_sync ( int mlog_id,
int region_id )
collective

Synchronize messages across ranks each starting at their given region.

Ranks recovering to later states will replay messages to ranks recovering to earlier states.

Parameters
[in]mlog_idThe logger to sync
[in]region_idThe region that this rank will begin at. May be FENIX_MLOG_CONTINUE, in which case this rank will recover to its latest region's latest message state (instead of restarting the region)
Returns
FENIX_SUCCESS if successful, any return code otherwise.