Functions for storing and restoring data in Fenix. More...
Functions | |
| int | Fenix_Data_group_create (int group_id, MPI_Comm comm, int start_time_stamp, int depth, int policy_name, void *policy_value, int *flag) |
| Create a Data Group. | |
| int | Fenix_Data_group_created (int group_id) |
| Query if a data group exists on this rank. | |
| int | Fenix_Data_member_create (int group_id, int member_id, void *buffer, int count, MPI_Datatype datatype) |
| Create a data member for store/restore operations. | |
| int | Fenix_Data_member_created (int group_id, int member_id) |
| Query if a data member exists on this rank. | |
| int | Fenix_Data_group_get_redundancy_policy (int group_id, int *policy_name, void *policy_value, int *flag) |
| Get the storage policy of a data group. | |
| int | Fenix_Data_wait (Fenix_Request request) |
| UNIMPLEMENTED Block on completion of the store operation specified by the request. | |
| int | Fenix_Data_test (Fenix_Request request, int *flag) |
| UNIMPLEMENTED Query completion of the store operation specified by the request. | |
| int | Fenix_Data_member_stage (int group_id, int member_id, const Fenix_Data_subset subset_specifier) |
| Serialize a group member's data into the member's local store. | |
| int | Fenix_Data_member_store (int group_id, int member_id, const Fenix_Data_subset subset_specifier) |
| Store a particular group member into the group's resilient storage space, in uncommitted storage. | |
| int | Fenix_Data_member_storev (int group_id, int member_id, const Fenix_Data_subset subset_specifier) |
| UNIMPLEMENTED As [store](Fenix_Data_member_store), but subsets may vary rank-to-rank. | |
| int | Fenix_Data_member_istore (int group_id, int member_id, const Fenix_Data_subset subset_specifier, Fenix_Request *request) |
| UNIMPLEMENTED As [store](Fenix_Data_member_store), but asynchronous. | |
| int | Fenix_Data_member_istorev (int group_id, int member_id, const Fenix_Data_subset subset_specifier, Fenix_Request *request) |
| UNIMPLEMENTED As [istore](Fenix_Data_member_istore), but asynchronous. | |
| int | Fenix_Data_commit (int group_id, int *time_stamp) |
| Commit stored data members to the group's next snapshot. | |
| int | Fenix_Data_commit_barrier (int group_id, int *time_stamp) |
| As commit, but ensures a globally consistent commit. | |
| int | Fenix_Data_checkpoint (int group_id, const Fenix_Data_subset subset, int num_storev, int *storev_ids, int *time_stamp) |
| Store all members of a group and then commit that group. | |
| int | Fenix_Data_barrier (int group_id) |
| UNIMPLEMENTED Block until all ranks in the group have reached this point. | |
| int | Fenix_Data_member_restore (int group_id, int member_id, void *target_buffer, int max_count, int time_stamp, Fenix_Data_subset *found_data) |
| Restore the data of a group member from a snapshot. | |
| int | Fenix_Data_member_lrestore (int group_id, int member_id, void *target_buffer, int max_count, int time_stamp, Fenix_Data_subset *found_data) |
| Local-only version of Fenix_Data_member_restore. | |
| int | Fenix_Data_member_restore_from_rank (int group_id, int member_id, void *data, int max_count, int time_stamp, Fenix_Data_subset *found_data, int source_rank) |
| UNIMPLEMENTED As Fenix_Data_member_restore, but restores from a specific rank's data. | |
| int | Fenix_Data_subset_create (int num_blocks, int start_offset, int end_offset, int stride, Fenix_Data_subset *subset_specifier) |
| Create a data subset for use in store operations. | |
| int | Fenix_Data_subset_createv (int num_blocks, int *array_start_offsets, int *array_end_offsets, Fenix_Data_subset *subset_specifier) |
| As Fenix_Data_subset_create, but with varying start and end offsets. | |
| int | Fenix_Data_subset_delete (Fenix_Data_subset *subset_specifier) |
| Delete a data subset. | |
| int | Fenix_Data_group_get_number_of_members (int group_id, int *number_of_members) |
| Get the number of members in a data group. | |
| int | Fenix_Data_group_get_member_at_position (int group_id, int *member_id, int position) |
| Get member ID based on member index. | |
| int | Fenix_Data_group_get_number_of_snapshots (int group_id, int *number_of_snapshots) |
| Get the number of locally-available snapshots in a data group. | |
| int | Fenix_Data_group_get_snapshot_at_position (int group_id, int position, int *time_stamp) |
| Get the time stamp of a snapshot at a given index. | |
| int | Fenix_Data_member_attr_get (int group_id, int member_id, int attributename, void *attributevalue, int *flag, int source_rank) |
| UNIMPLEMENTED Get the value of a member's attribute. | |
| int | Fenix_Data_member_attr_set (int group_id, int member_id, int attribute_name, void *attribute_value, int *flag) |
| Set the value of a member's attribute. | |
| int | Fenix_Data_snapshot_delete (int group_id, int time_stamp) |
| Delete a snapshot from a data group. | |
| int | Fenix_Data_group_delete (int group_id) |
| Delete a data group. | |
| int | Fenix_Data_member_delete (int group_id, int member_id) |
| Delete a data member. | |
Variables | |
| const Fenix_Data_subset | FENIX_DATA_SUBSET_FULL |
| A standin for checkpointing/recovering the full member's data. | |
| const Fenix_Data_subset | FENIX_DATA_SUBSET_EMPTY |
| A standin for checkpointing/recovering no data. | |
| const Fenix_Data_subset | FENIX_DATA_SUBSET_PRESTAGED |
| A standin for checkpointing/recovering all of pre-staged data. | |
| Fenix_Data_subset * | FENIX_DATA_SUBSET_IGNORE |
Functions for storing and restoring data in Fenix.
Fenix provides options for redundant storage of application data to facilitate application data recovery in a transparent manner. Fenix contains functions to control consistency of collections of such data, as well as their level of persistence. Functions with the prefix Fenix_Data_ perform store, versioning, restore, and other relevant operations and form the Fenix data recovery API. The user can select a specific set of application data, identified by its location in memory, label it using Fenix_Data_member_create, and copy it into Fenix's redundant storage space through Fenix_Data_member(i)store(v) at a point in time. Subsequently, Fenix_Data_commit finalizes all preceding Fenix store operations involving this data group and assigns a unique time stamp to the resulting data snapshot, marking the data as potentially recoverable after a loss of ranks. Individual pieces of data can then be restored whenever they are needed with Fenix_Data_member_restore, for example after a failure occurs. We note that Fenix's data storage and recovery facility aims primarily to support in-memory recovery.
Populating redundant data storage using Fenix may involve the dispersion of data created by one rank to other ranks within the system, making the store operation semantically a collective operation. However, Fenix does not require store operations to be globally synchronizing. For example, execution of Fenix_Data_member_store for a particular collection of data could potentially be finished in some ranks, but not yet in others. And if certain ranks nominally participating in the storage operations have no actual data movement responsibility, Fenix is allowerd to let them exit the operation immediately. Consequently, Fenix data storage functions should not be used for synchronization purposes.
Multiple distinct pieces (members) of data assigned to Fenix-managed redundant storage, can be associated with a specific instance of a Fenix data group to form a semantic unit. Committing such a group ensures that the data involved is available for recovery.
A Fenix data group provides dual functionality. First, it serves as a container for a set of data objects (members) that are committed together, and hence provides transaction semantics. Second, it recognizes that Fenix_Data_member_store is an operation carried out collectively by a group of ranks, but not necessarily by all active ranks in the MPI environment. Hence, it adopts the convenient MPI vehicle of communicators to indicate the subset of ranks involved. Data groups are composed of members that describe the actual application data and the redundancy policy to be used for securely storing the members.
Data groups can and should be recreated after each failure (i.e. do not conditionally skip the creation after initialization).
See Fenix_Data_group_create for creating a data group.
Fenix internally uses an extensible system for defining data policies to keep the door open to easily adding new data policies and configuring them on a per-data-group basis. We currently support a single, configurable, memory-based policy.
IMR is referenced with the FENIX_DATA_POLICY_IN_MEMORY_RAID definition, and takes as input an array of integers with the following usage:
The policy is designed to localize recovery as much as possible. Communication amongst group members is required (as failure during recovery operations can lead to inconsistent beliefs about which ranks have recovered data), but groups without recovering ranks may then all recover locally rather than communicating further. Groups need not wait for ranks outside of their group to enter or exit recovery.
Mode 1: Groups ranks into dyadically paired partners of Rank N and Rank (N+Separation). For odd-size communicators, a single group of size 3 will also form of the first, middle, and last ranks. Each rank stores a copy of its own data and a copy of its partner's. For groups of three, partner data storage is chained. Should both partners fail (or any two for groups of three) before recovery operations have completed, data will be unrecoverable.
Memory Usage: Each rank stores a copy of its own data and of its partner's data for each timestamp, where checkpoint depth D stores D+1 checkpoints. Therefore for data size M, (D+1)*M*2 bytes are used.
Computation: None.
Mode 5: Groups ranks into parity groups of size GroupSize. Groups are formed of Rank N, N+Separation, N+2*Separation. If any two ranks in a group fail before recovery operations have completed, data will be unrecoverable.
Memory Usage: Each rank stores a copy of its own data and M/(GroupSize-1) parity bytes per timestamp. Therefore, (D+1)*M*(GroupSize/(GroupSize-1)) bytes are used.
Computation: O(M) parity bit calculations.
These options enable users to trade reliability and computation for memory space, which may be necessary for applications with large memory usage.
|
collective |
Store all members of a group and then commit that group.
Stores each member in order of their creation in the group. Equivalent to invoking Fenix_Data_member_store with the specified subset. If a member's id is listed in storev_ids, this is instead equivalent to invoking Fenix_Data_member_storev.
After storing, equivalent to invoking Fenix_Data_commit.
This function supports inline recovery when it is active (see Fenix_Mlog_activate).
| [in] | group_id | The group to checkpoint |
| [in] | subset | The subset of each member to store. |
| [in] | num_storev | The size of the storev_ids array, or FENIX_STOREV_ALL. |
| [in] | storev_ids | Array of member ids to store as storev. May be null if num_storev is zero or FENIX_STOREV_ALL. |
| [out] | time_stamp | Pointer to store the time stamp of the commit to, or FENIX_TIME_STAMP_IGNORE. |
|
collectivelocal |
Commit stored data members to the group's next snapshot.
This function is used to freeze the current state of a data group, together with all its application data that has been stored in Fenix’ redundant storage, and label it with a time stamp, thus creating a snapshot of the stored application data. Only data that has been committed is eligible for recovery through Fenix_Data_member_restore. An application needs to call Fenix_Data_wait for all pending asynchronous Fenix_Data_member_istore(v) operations in the group before committing.
| [in] | group_id | The group to commit |
| [out] | time_stamp | The time stamp of the new snapshot |
|
collective |
As commit, but ensures a globally consistent commit.
This function does not function as a traditional barrier. The commit will proceed if all non-failed ranks reach the barrier. This allows for commits to be made when a rank fails after storing all of its data into resilient storage.
| [in] | group_id | The group to commit |
| [out] | time_stamp | The time stamp of the new snapshot |
|
collective |
Create a Data Group.
If a group with this group_id was already created in the past and has not been deleted, the parameters of this call are ignored and this function simply serves to coordinate with any ranks that have not yet created this group (e.g. due to a failure).
All calling ranks must pass the same values for the parameters group_id, comm, start_time_stamp, policy_name, and policy_value.
| group_id | A unique identifier to this group. |
| comm | A resilient communicator on which the group is formed. |
| start_time_stamp | The time_stamp to be used for the first commit in this group. |
| depth | The number of successive snapshots of this group that are retained by Fenix, in addition to the most recent one, and that can be recovered by calling Fenix data member restore functions. For example, a depth of 0 means Fenix will keep only the necessary data to restore the most recent snapshot, freeing or overwriting older snapshots automatically. A depth of -1 is currently not supported, but would ordinarily indicate that no snapshots should be removed automatically. |
| policy_name | Currently, may only be FENIX_DATA_POLICY_IN_MEMORY_RAID |
| policy_value | Pointer to data passed along to the policy. See the specific policy for more information. |
| flag | pointer to store policy-specific status or errors |
|
local |
Query if a data group exists on this rank.
| group_id | Group identifier |
|
local |
Delete a data group.
| [in] | group_id | The group to delete |
| int Fenix_Data_group_get_member_at_position | ( | int | group_id, |
| int * | member_id, | ||
| int | position ) |
Get member ID based on member index.
See Fenix_Data_group_get_number_of_members
| [in] | group_id | The group to query |
| [out] | member_id | The member id at this index in the group |
| [in] | position | The position to check, [0, number_of_members) |
| int Fenix_Data_group_get_number_of_members | ( | int | group_id, |
| int * | number_of_members ) |
Get the number of members in a data group.
| [in] | group_id | The group to query |
| [out] | number_of_members | Number of members in the group |
| int Fenix_Data_group_get_number_of_snapshots | ( | int | group_id, |
| int * | number_of_snapshots ) |
Get the number of locally-available snapshots in a data group.
May include snapshots that are inconsistent across the group.
| [in] | group_id | The group to query |
| [out] | number_of_snapshots | The number of snapshots in the group |
| int Fenix_Data_group_get_redundancy_policy | ( | int | group_id, |
| int * | policy_name, | ||
| void * | policy_value, | ||
| int * | flag ) |
Get the storage policy of a data group.
| group_id | Identified to the data group to query |
| policy_name | The identifier of the policy name of the data group. |
| policy_value | A location within which to store the policy_values this group's policy was configured with. |
| flag | A location set to true if a policy value was extracted, else false. |
| int Fenix_Data_group_get_snapshot_at_position | ( | int | group_id, |
| int | position, | ||
| int * | time_stamp ) |
Get the time stamp of a snapshot at a given index.
Snapshots are indexed in reverse order in which the user committed them (e.g. the most recent available snapshot has position=0).
| [in] | group_id | The group to query |
| [in] | position | The index of the snapshot, which must be [0, number_of_snapshots) |
| [out] | time_stamp | The time stamp of the snapshot |
| int Fenix_Data_member_attr_set | ( | int | group_id, |
| int | member_id, | ||
| int | attribute_name, | ||
| void * | attribute_value, | ||
| int * | flag ) |
Set the value of a member's attribute.
Valid names are FENIX_DATA_MEMBER_ATTRIBUTE_BUFFER, FENIX_DATA_MEMBER_ATTRIBUTE_COUNT, and FENIX_DATA_MEMBER_ATTRIBUTE_DATATYPE.
The COUNT and DATATYPE attributes may only be set before the first store operation. Contrary to the Fenix specification, returning to Fenix_Init after a failure does not allow the user to set these attributes again.
| [in] | group_id | The group to update |
| [in] | member_id | The member to update |
| [in] | attribute_name | The attribute to update |
| [in] | attribute_value | The new value of the attribute |
| [out] | flag | Set to true if the attribute was set, else false |
|
collectivelocal |
Create a data member for store/restore operations.
All calling ranks in the group's communicator must pass the same values for the parameters member_id, datatype, and group_id.
| group_id | Identifier to a data group within which to create the member. |
| member_id | An integer unique within the data group that identifies the data in source_buffer. Must be nonnegative and less than FENIX_MEMBER_ID_MAX, which is guaranteed to be at least 2^30. |
| buffer | Address of the data to be copied to redundant storage maintained by Fenix. Note that this parameter may also be specified using Fenix_Data_member_attr_set, which is critical for non-survivor ranks after a failure which will have an invalid address which was generated on the failed rank and must update. |
| count | The maximum number of contiguous elements of type datatype of the data to be stored. A value of FENIX_RESIZEABLE allows this member to have a varying data size. |
| datatype | The MPI_Datatype of the elements in source_buffer |
|
local |
Query if a data member exists on this rank.
| group_id | Group identifier |
| member_id | Member identifier |
|
local |
Delete a data member.
| [in] | group_id | The group to delete from |
| [in] | member_id | The member to delete |
| int Fenix_Data_member_lrestore | ( | int | group_id, |
| int | member_id, | ||
| void * | target_buffer, | ||
| int | max_count, | ||
| int | time_stamp, | ||
| Fenix_Data_subset * | found_data ) |
Local-only version of Fenix_Data_member_restore.
This function restores the data of a group member from the local snapshot.
| [in] | group_id | The group to restore from |
| [in] | member_id | The member to restore |
| [out] | target_buffer | The buffer to store the restored data |
| [in] | max_count | The maximum number of elements to restore |
| [in] | time_stamp | The time stamp of the snapshot to restore from |
| [out] | found_data | The subset of the data that was found in the snapshot |
|
collective |
Restore the data of a group member from a snapshot.
All ranks in the group’s resilient communicator must pass the same values for the parameters group_id, member_id, and time_stamp. This function is used to retrieve data from consistent snapshot members. This function can only be used if the size of the communicator used to store the data is the same as that at the time of data recovery (this implies non-shrinking communicator recovery in case of a rank loss).
If the size of the buffer needing to receive the recovery data is unknown for a particular rank, it can be queried using Fenix_Data_member_attr_get.
| [in] | group_id | The group to restore from |
| [in] | member_id | The member to restore |
| [out] | target_buffer | The buffer to store the restored data |
| [in] | max_count | The maximum number of elements to restore |
| [in] | time_stamp | The time stamp of the snapshot to restore from |
| [out] | found_data | The subset of the data that was found in the snapshot |
| int Fenix_Data_member_stage | ( | int | group_id, |
| int | member_id, | ||
| const Fenix_Data_subset | subset_specifier ) |
Serialize a group member's data into the member's local store.
A store operation can broken into two parts: locally staging the data within Fenix, then policy-specific operations to make the data resilient to faults. This function performs ONLY the first part. Applications should subsequently make a store of this member to the FENIX_DATA_SUBSET_PRESTAGED data subset.
It is undefined behaviour to commit staged-but-not-stored data.
| group_id | All ranks must provide the same group_id |
| member_id | All ranks must provide the same member_id |
| subset_specifier | Which subset of the data to stage. FENIX_DATA_SUBSET_ALL is invalid if member size is FENIX_RESIZEABLE. FENIX_DATA_SUBSET_PRESTAGED is invalid. |
|
collective |
Store a particular group member into the group's resilient storage space, in uncommitted storage.
The user can safely modify the member's data buffer after this call, as the current state is copied immediately. Multiple calls may be used to incrementally store data (using subset_specifiers), or overwrite old data prior to a commit.
| group_id | All ranks must provide the same group_id |
| member_id | All ranks must provide the same member_id |
| subset_specifier | Which subset of the data to store. If this member was created with size FENIX_RESIZEABLE, FENIX_DATA_SUBSET_ALL is an invalid input. |
|
local |
Delete a snapshot from a data group.
| [in] | group_id | The group to delete from |
| [in] | time_stamp | The time stamp of the snapshot to delete |
| int Fenix_Data_subset_create | ( | int | num_blocks, |
| int | start_offset, | ||
| int | end_offset, | ||
| int | stride, | ||
| Fenix_Data_subset * | subset_specifier ) |
Create a data subset for use in store operations.
Creates a subset based on num_blocks pairs of {start_offset,end_offset}, {start_offset+stride,end_offset+stride}, {start_offset+2*stride,end_offset+2*stride}, etc.
The value of start_offset must be smaller than or equal to the value of end_offset to indicate non-negative block size. Otherwise, the function returns an error code.
Created subsets must be deleted with Fenix_Data_subset_delete to free memory.
| [in] | num_blocks | The number of contiguous data blocks. |
| [in] | start_offset | The index of the first element in the first data block. |
| [in] | end_offset | The index of the last element in the first data block. |
| [in] | stride | Regular shift between successive data blocks. |
| [out] | subset_specifier | The created subset. |
| int Fenix_Data_subset_createv | ( | int | num_blocks, |
| int * | array_start_offsets, | ||
| int * | array_end_offsets, | ||
| Fenix_Data_subset * | subset_specifier ) |
As Fenix_Data_subset_create, but with varying start and end offsets.
Creates a subset based on num_blocks pairs of {start_offset,end_offset}. The value of start_offset must be smaller than or equal to end_offset to indicate non-negative block size. Otherwise, the function returns an error code.
Created subsets must be deleted with Fenix_Data_subset_delete to free memory.
| [in] | num_blocks | The number of contiguous data blocks. |
| [in] | array_start_offsets | The index of the first element in each data block. |
| [in] | array_end_offsets | The index of the last element in each data block. |
| [out] | subset_specifier | The created subset. |
| int Fenix_Data_subset_delete | ( | Fenix_Data_subset * | subset_specifier | ) |
Delete a data subset.
Frees the memory associated with a data subset object.
| [in] | subset_specifier | The subset to delete. |