Functions for logging and replaying MPI messages after faults.
More...
|
|
#define | FENIX_MLOG_NONE -1 |
| |
|
#define | FENIX_MLOG_CONTINUE -1 |
| |
|
| int | Fenix_Mlog_create (int mlog_id, MPI_Comm *comm, int depth) |
| | Create a new message logger.
|
| |
| int | Fenix_Mlog_activate (int mlog_id) |
| | Active a given mlog, deactivating any previously active mlog.
|
| |
| int | Fenix_Mlog_active (int *mlog_id) |
| | Get the currently active message log.
|
| |
| int | Fenix_Mlog_begin_region (int mlog_id, int region_id) |
| | Set the region of the given message logger.
|
| |
| int | Fenix_Mlog_activate_region (int mlog_id, int region_id) |
| | Activate the mlog and begin the region.
|
| |
| int | Fenix_Mlog_sync (int mlog_id, int region_id) |
| | Synchronize messages across ranks each starting at their given region.
|
| |
| int | Fenix_Mlog_stage (int mlog_id, int group_id, int member_id) |
| | Stage an mlog's data into a Fenix data member.
|
| |
| int | Fenix_Mlog_lrestore (int mlog_id, int group_id, int member_id, int time_stamp) |
| | Restore an mlog from a Fenix data member's local snapshot.
|
| |
| int | Fenix_Mlog_delete (int mlog_id) |
| | Delete an mlog.
|
| |
Functions for logging and replaying MPI messages after faults.
◆ Fenix_Mlog_activate()
| int Fenix_Mlog_activate |
( |
int | mlog_id | ) |
|
|
local |
Active a given mlog, deactivating any previously active mlog.
Only the active mlog can log messages. If any errors occur, no mlog will be active (even if one was active before).
Fenix functions will not be logged, regardless of any active mlog. However, some functions may support inline recovery if it is active. These functions will specify for themselves if they support inline recovery. There are three conditions for inline recovery to be active:
- FENIX_MLOG_RECOVERY_MODE is not FENIX_MLOG_RECOVERY_MANUAL
- FENIX_RECOVERY_MODE is not FENIX_RECOVERY_IGNORE
- An mlog is active.
- Parameters
-
| [in] | mlog_id | The log to activate. May be FENIX_MLOG_NONE. |
- Returns
- FENIX_SUCCESS if successful, any return code otherwise.
◆ Fenix_Mlog_activate_region()
| int Fenix_Mlog_activate_region |
( |
int | mlog_id, |
|
|
int | region_id ) |
|
local |
Activate the mlog and begin the region.
This helper function is equivalent to: Fenix_Mlog_activate(mlog_id); Fenix_Mlog_begin_region(mlog_id, region_id);
- Parameters
-
| [in] | mlog_id | The logger to activate and set the region of |
| [in] | region_id | The region ID to set, with the same semantics as Fenix_Mlog_begin_region |
- Returns
- FENIX_SUCCESS if successful, any return code otherwise.
◆ Fenix_Mlog_active()
| int Fenix_Mlog_active |
( |
int * | mlog_id | ) |
|
|
local |
Get the currently active message log.
- Parameters
-
| [out] | mlog_id | The active log, may be FENIX_MLOG_NONE |
- Returns
- FENIX_SUCCESS if successful, any return code otherwise.
◆ Fenix_Mlog_begin_region()
| int Fenix_Mlog_begin_region |
( |
int | mlog_id, |
|
|
int | region_id ) |
|
local |
Set the region of the given message logger.
- Parameters
-
| [in] | mlog_id | The logger to set the region of |
| [in] | region_id | The region ID to set Must be positive and greater than current region_id (may equal current region_id if no messages have been logged in the region) |
- Returns
- FENIX_SUCCESS if successful, any return code otherwise.
◆ Fenix_Mlog_create()
| int Fenix_Mlog_create |
( |
int | mlog_id, |
|
|
MPI_Comm * | comm, |
|
|
int | depth ) |
|
local |
Create a new message logger.
- Parameters
-
| [in] | mlog_id | A unique identifier (>= 0) for this message logger |
| [in] | comm | The MPI_Comm to log messages from. Must be repaired after failures. |
| [in] | depth | Number of regions to keep in logs at once. Older regions will be deleted automatically. |
- Returns
- FENIX_SUCCESS if successful, any return code otherwise.
◆ Fenix_Mlog_delete()
| int Fenix_Mlog_delete |
( |
int | mlog_id | ) |
|
|
local |
Delete an mlog.
- Parameters
-
| [in] | mlog_id | The mlog to delete |
- Returns
- FENIX_SUCCESS if successful, any return code otherwise.
◆ Fenix_Mlog_lrestore()
| int Fenix_Mlog_lrestore |
( |
int | mlog_id, |
|
|
int | group_id, |
|
|
int | member_id, |
|
|
int | time_stamp ) |
|
local |
Restore an mlog from a Fenix data member's local snapshot.
- Parameters
-
| [in] | mlog_id | The mlog to restore. |
| [in] | group_id | The group to restore from. |
| [in] | member_id | The member to restore from. |
| [in] | time_stamp | The time stamp of the snapshot to restore from. |
- Returns
- FENIX_SUCCESS if successful, any return code otherwise.
◆ Fenix_Mlog_stage()
| int Fenix_Mlog_stage |
( |
int | mlog_id, |
|
|
int | group_id, |
|
|
int | member_id ) |
|
local |
Stage an mlog's data into a Fenix data member.
- Parameters
-
| [in] | mlog_id | The mlog to stage. |
| [in] | group_id | The group to stage into. |
| [in] | member_id | The member to stage into. The member does not have to already exist. If the member already exists, it must have been created with size FENIX_RESIZEABLE and datatype MPI_BYTE |
- Returns
- FENIX_SUCCESS if successful, any return code otherwise.
◆ Fenix_Mlog_sync()
| int Fenix_Mlog_sync |
( |
int | mlog_id, |
|
|
int | region_id ) |
|
collective |
Synchronize messages across ranks each starting at their given region.
Ranks recovering to later states will replay messages to ranks recovering to earlier states.
- Parameters
-
| [in] | mlog_id | The logger to sync |
| [in] | region_id | The region that this rank will begin at. May be FENIX_MLOG_CONTINUE, in which case this rank will recover to its latest region's latest message state (instead of restarting the region) |
- Returns
- FENIX_SUCCESS if successful, any return code otherwise.