Functions for storing and restoring data in Fenix. More...
Functions | |
| int | Fenix_Data_group_create (int group_id, MPI_Comm comm, int start_time_stamp, int depth, int policy_name, void *policy_value, int *flag) |
| Create a Data Group. | |
| int | Fenix_Data_member_create (int group_id, int member_id, void *buffer, int count, MPI_Datatype datatype) |
| Create a data member for store/restore operations. | |
| int | Fenix_Data_group_get_redundancy_policy (int group_id, int *policy_name, void *policy_value, int *flag) |
| Get the storage policy of a data group. | |
| int | Fenix_Data_wait (Fenix_Request request) |
| UNIMPLEMENTED Block on completion of the store operation specified by the request. | |
| int | Fenix_Data_test (Fenix_Request request, int *flag) |
| UNIMPLEMENTED Query completion of the store operation specified by the request. | |
| int | Fenix_Data_member_store (int group_id, int member_id, Fenix_Data_subset subset_specifier) |
| Store a particular group member into the group's resilient storage space, in uncommitted storage. | |
| int | Fenix_Data_member_storev (int group_id, int member_id, Fenix_Data_subset subset_specifier) |
| UNIMPLEMENTED As [store](Fenix_Data_member_store), but subsets may vary rank-to-rank. | |
| int | Fenix_Data_member_istore (int group_id, int member_id, Fenix_Data_subset subset_specifier, Fenix_Request *request) |
| UNIMPLEMENTED As [store](Fenix_Data_member_store), but asynchronous. | |
| int | Fenix_Data_member_istorev (int group_id, int member_id, Fenix_Data_subset subset_specifier, Fenix_Request *request) |
| UNIMPLEMENTED As [istore](Fenix_Data_member_istore), but asynchronous. | |
| int | Fenix_Data_commit (int group_id, int *time_stamp) |
| Commit stored data members to the group's next snapshot. | |
| int | Fenix_Data_commit_barrier (int group_id, int *time_stamp) |
| As commit, but ensures a globally consistent commit. | |
| int | Fenix_Data_barrier (int group_id) |
| UNIMPLEMENTED Block until all ranks in the group have reached this point. | |
| int | Fenix_Data_member_restore (int group_id, int member_id, void *target_buffer, int max_count, int time_stamp, Fenix_Data_subset *found_data) |
| Restore the data of a group member from a snapshot. | |
| int | Fenix_Data_member_lrestore (int group_id, int member_id, void *target_buffer, int max_count, int time_stamp, Fenix_Data_subset *found_data) |
| Local-only version of Fenix_Data_member_restore. | |
| int | Fenix_Data_member_restore_from_rank (int group_id, int member_id, void *data, int max_count, int time_stamp, Fenix_Data_subset *found_data, int source_rank) |
| UNIMPLEMENTED As Fenix_Data_member_restore, but restores from a specific rank's data. | |
| int | Fenix_Data_subset_create (int num_blocks, int start_offset, int end_offset, int stride, Fenix_Data_subset *subset_specifier) |
| Create a data subset for use in store operations. | |
| int | Fenix_Data_subset_createv (int num_blocks, int *array_start_offsets, int *array_end_offsets, Fenix_Data_subset *subset_specifier) |
| As Fenix_Data_subset_create, but with varying start and end offsets. | |
| int | Fenix_Data_subset_delete (Fenix_Data_subset *subset_specifier) |
| Delete a data subset. | |
| int | Fenix_Data_group_get_number_of_members (int group_id, int *number_of_members) |
| UNIMPLEMENTED Get the number of members in a data group. | |
| int | Fenix_Data_group_get_member_at_position (int group_id, int *member_id, int position) |
| UNIMPLEMENTED Get member ID based on member index | |
| int | Fenix_Data_group_get_number_of_snapshots (int group_id, int *number_of_snapshots) |
| Get the number of locally-available snapshots in a data group. | |
| int | Fenix_Data_group_get_snapshot_at_position (int group_id, int position, int *time_stamp) |
| Get the time stamp of a snapshot at a given index. | |
| int | Fenix_Data_member_attr_get (int group_id, int member_id, int attributename, void *attributevalue, int *flag, int source_rank) |
| UNIMPLEMENTED Get the value of a member's attribute. | |
| int | Fenix_Data_member_attr_set (int group_id, int member_id, int attribute_name, void *attribute_value, int *flag) |
| Set the value of a member's attribute. | |
| int | Fenix_Data_snapshot_delete (int group_id, int time_stamp) |
| Delete a snapshot from a data group. | |
| int | Fenix_Data_group_delete (int group_id) |
| Delete a data group. | |
| int | Fenix_Data_member_delete (int group_id, int member_id) |
| Delete a data member. | |
Variables | |
| const Fenix_Data_subset | FENIX_DATA_SUBSET_FULL |
| A standin for checkpointing/recovering all available data in a member. | |
| const Fenix_Data_subset | FENIX_DATA_SUBSET_EMPTY |
| A standin for checkpointing/recovering none of the available data in a member. | |
Functions for storing and restoring data in Fenix.
Fenix provides options for redundant storage of application data to facilitate application data recovery in a transparent manner. Fenix contains functions to control consistency of collections of such data, as well as their level of persistence. Functions with the prefix Fenix_Data_ perform store, versioning, restore, and other relevant operations and form the Fenix data recovery API. The user can select a specific set of application data, identified by its location in memory, label it using Fenix_Data_member_create, and copy it into Fenix's redundant storage space through Fenix_Data_member(i)store(v) at a point in time. Subsequently, Fenix_Data_commit finalizes all preceding Fenix store operations involving this data group and assigns a unique time stamp to the resulting data snapshot, marking the data as potentially recoverable after a loss of ranks. Individual pieces of data can then be restored whenever they are needed with Fenix_Data_member_restore, for example after a failure occurs. We note that Fenix's data storage and recovery facility aims primarily to support in-memory recovery.
Populating redundant data storage using Fenix may involve the dispersion of data created by one rank to other ranks within the system, making the store operation semantically a collective operation. However, Fenix does not require store operations to be globally synchronizing. For example, execution of Fenix_Data_member_store for a particular collection of data could potentially be finished in some ranks, but not yet in others. And if certain ranks nominally participating in the storage operations have no actual data movement responsibility, Fenix is allowerd to let them exit the operation immediately. Consequently, Fenix data storage functions should not be used for synchronization purposes.
Multiple distinct pieces (members) of data assigned to Fenix-managed redundant storage, can be associated with a specific instance of a Fenix data group to form a semantic unit. Committing such a group ensures that the data involved is available for recovery.
A Fenix data group provides dual functionality. First, it serves as a container for a set of data objects (members) that are committed together, and hence provides transaction semantics. Second, it recognizes that Fenix_Data_member_store is an operation carried out collectively by a group of ranks, but not necessarily by all active ranks in the MPI environment. Hence, it adopts the convenient MPI vehicle of communicators to indicate the subset of ranks involved. Data groups are composed of members that describe the actual application data and the redundancy policy to be used for securely storing the members.
Data groups can and should be recreated after each failure (i.e. do not conditionally skip the creation after initialization).
See Fenix_Data_group_create for creating a data group.
Fenix internally uses an extensible system for defining data policies to keep the door open to easily adding new data policies and configuring them on a per-data-group basis. We currently support a single, configurable, memory-based policy.
IMR is referenced with the FENIX_DATA_POLICY_IN_MEMORY_RAID definition, and takes as input an array of integers with the following usage:
The policy is designed to localize recovery as much as possible. Communication amongst group members is required (as failure during recovery operations can lead to inconsistent beliefs about which ranks have recovered data), but groups without recovering ranks may then all recover locally rather than communicating further. Groups need not wait for ranks outside of their group to enter or exit recovery.
Mode 1: Groups ranks into dyadically paired partners of Rank N and Rank (N+Separation). For odd-size communicators, a single group of size 3 will also form of the first, middle, and last ranks. Each rank stores a copy of its own data and a copy of its partner's. For groups of three, partner data storage is chained. Should both partners fail (or any two for groups of three) before recovery operations have completed, data will be unrecoverable.
Memory Usage: Each rank stores a copy of its own data and of its partner's data for each timestamp, where checkpoint depth D stores D+1 checkpoints. Therefore for data size M, (D+1)*M*2 bytes are used.
Computation: None.
Mode 5: Groups ranks into parity groups of size GroupSize. Groups are formed of Rank N, N+Separation, N+2*Separation. If any two ranks in a group fail before recovery operations have completed, data will be unrecoverable.
Memory Usage: Each rank stores a copy of its own data and M/(GroupSize-1) parity bytes per timestamp. Therefore, (D+1)*M*(GroupSize/(GroupSize-1)) bytes are used.
Computation: O(M) parity bit calculations.
These options enable users to trade reliability and computation for memory space, which may be necessary for applications with large memory usage.
|
collectivelocal |
Commit stored data members to the group's next snapshot.
This function is used to freeze the current state of a data group, together with all its application data that has been stored in Fenix’ redundant storage, and label it with a time stamp, thus creating a snapshot of the stored application data. Only data that has been committed is eligible for recovery through Fenix_Data_member_restore. An application needs to call Fenix_Data_wait for all pending asynchronous Fenix_Data_member_istore(v) operations in the group before committing.
| [in] | group_id | The group to commit |
| [out] | time_stamp | The time stamp of the new snapshot |
|
collective |
As commit, but ensures a globally consistent commit.
This function does not function as a traditional barrier. The commit will proceed if all non-failed ranks reach the barrier. This allows for commits to be made when a rank fails after storing all of its data into resilient storage.
| [in] | group_id | The group to commit |
| [out] | time_stamp | The time stamp of the new snapshot |
|
collective |
Create a Data Group.
If a group with this group_id was already created in the past and has not been deleted, the parameters of this call are ignored and this function simply serves to coordinate with any ranks that have not yet created this group (e.g. due to a failure).
All calling ranks must pass the same values for the parameters group_id, comm, start_time_stamp, policy_name, and policy_value.
| group_id | A unique identifier to this group. |
| comm | A resilient communicator on which the group is formed. |
| start_time_stamp | The time_stamp to be used for the first commit in this group. |
| depth | The number of successive snapshots of this group that are retained by Fenix, in addition to the most recent one, and that can be recovered by calling Fenix data member restore functions. For example, a depth of 0 means Fenix will keep only the necessary data to restore the most recent snapshot, freeing or overwriting older snapshots automatically. A depth of -1 is currently not supported, but would ordinarily indicate that no snapshots should be removed automatically. |
| policy_name | Currently, may only be FENIX_DATA_POLICY_IN_MEMORY_RAID |
| policy_value | Pointer to data passed along to the policy. See the specific policy for more information. |
| flag | pointer to store policy-specific status or errors |
|
local |
Delete a data group.
| [in] | group_id | The group to delete |
| int Fenix_Data_group_get_number_of_snapshots | ( | int | group_id, |
| int * | number_of_snapshots ) |
Get the number of locally-available snapshots in a data group.
May include snapshots that are inconsistent across the group.
| [in] | group_id | The group to query |
| [out] | number_of_snapshots | The number of snapshots in the group |
| int Fenix_Data_group_get_redundancy_policy | ( | int | group_id, |
| int * | policy_name, | ||
| void * | policy_value, | ||
| int * | flag ) |
Get the storage policy of a data group.
| group_id | Identified to the data group to query |
| policy_name | The identifier of the policy name of the data group. |
| policy_value | A location within which to store the policy_values this group's policy was configured with. |
| flag | A location set to true if a policy value was extracted, else false. |
| int Fenix_Data_group_get_snapshot_at_position | ( | int | group_id, |
| int | position, | ||
| int * | time_stamp ) |
Get the time stamp of a snapshot at a given index.
Snapshots are indexed in reverse order in which the user committed them (e.g. the most recent available snapshot has position=0).
| [in] | group_id | The group to query |
| [in] | position | The index of the snapshot, which must be [0, number_of_snapshots) |
| [out] | time_stamp | The time stamp of the snapshot |
| int Fenix_Data_member_attr_set | ( | int | group_id, |
| int | member_id, | ||
| int | attribute_name, | ||
| void * | attribute_value, | ||
| int * | flag ) |
Set the value of a member's attribute.
Valid names are FENIX_DATA_MEMBER_ATTRIBUTE_BUFFER, FENIX_DATA_MEMBER_ATTRIBUTE_COUNT, and FENIX_DATA_MEMBER_ATTRIBUTE_DATATYPE.
The COUNT and DATATYPE attributes may only be set before the first store operation. Contrary to the Fenix specification, returning to Fenix_Init after a failure does not allow the user to set these attributes again.
| [in] | group_id | The group to update |
| [in] | member_id | The member to update |
| [in] | attribute_name | The attribute to update |
| [in] | attribute_value | The new value of the attribute |
| [out] | flag | Set to true if the attribute was set, else false |
|
collectivelocal |
Create a data member for store/restore operations.
All calling ranks in the group's communicator must pass the same values for the parameters member_id, datatype, and group_id.
| group_id | Identifier to a data group within which to create the member. |
| member_id | An integer unique within the data group that identifies the data in source_buffer. Must be nonnegative and less than FENIX_MEMBER_ID_MAX, which is guaranteed to be at least 2^30. |
| buffer | Address of the data to be copied to redundant storage maintained by Fenix. Note that this parameter may also be specified using Fenix_Data_member_attr_set, which is critical for non-survivor ranks after a failure which will have an invalid address which was generated on the failed rank and must update. |
| count | The maximum number of contiguous elements of type datatype of the data to be stored. Need not be the same in all calling ranks. |
| datatype | The MPI_Datatype of the elements in source_buffer |
|
local |
Delete a data member.
| [in] | group_id | The group to delete from |
| [in] | member_id | The member to delete |
| int Fenix_Data_member_lrestore | ( | int | group_id, |
| int | member_id, | ||
| void * | target_buffer, | ||
| int | max_count, | ||
| int | time_stamp, | ||
| Fenix_Data_subset * | found_data ) |
Local-only version of Fenix_Data_member_restore.
This function restores the data of a group member from the local snapshot.
| [in] | group_id | The group to restore from |
| [in] | member_id | The member to restore |
| [out] | target_buffer | The buffer to store the restored data |
| [in] | max_count | The maximum number of elements to restore |
| [in] | time_stamp | The time stamp of the snapshot to restore from |
| [out] | found_data | The subset of the data that was found in the snapshot |
|
collective |
Restore the data of a group member from a snapshot.
All ranks in the group’s resilient communicator must pass the same values for the parameters group_id, member_id, and time_stamp. This function is used to retrieve data from consistent snapshot members. This function can only be used if the size of the communicator used to store the data is the same as that at the time of data recovery (this implies non-shrinking communicator recovery in case of a rank loss).
If the size of the buffer needing to receive the recovery data is unknown for a particular rank, it can be queried using Fenix_Data_member_attr_get.
| [in] | group_id | The group to restore from |
| [in] | member_id | The member to restore |
| [out] | target_buffer | The buffer to store the restored data |
| [in] | max_count | The maximum number of elements to restore |
| [in] | time_stamp | The time stamp of the snapshot to restore from |
| [out] | found_data | The subset of the data that was found in the snapshot |
|
collective |
Store a particular group member into the group's resilient storage space, in uncommitted storage.
The user can safely modify the member's data buffer after this call, as the current state is copied immediately. Multiple calls may be used to incrementally store data (using subset_specifiers), or overwrite old data prior to a commit.
| group_id | All ranks must provide the same group_id |
| member_id | All ranks must provide the same member_id |
| subset_specifier | Which subset of the data to store. It is always valid for every rank to provide the same subset_specifier; depending on the group's policy, varying combinations of specifiers may be possible. |
|
local |
Delete a snapshot from a data group.
| [in] | group_id | The group to delete from |
| [in] | time_stamp | The time stamp of the snapshot to delete |
| int Fenix_Data_subset_create | ( | int | num_blocks, |
| int | start_offset, | ||
| int | end_offset, | ||
| int | stride, | ||
| Fenix_Data_subset * | subset_specifier ) |
Create a data subset for use in store operations.
Creates a subset based on num_blocks pairs of {start_offset,end_offset}, {start_offset+stride,end_offset+stride}, {start_offset+2*stride,end_offset+2*stride}, etc.
The value of start_offset must be smaller than or equal to the value of end_offset to indicate non-negative block size. Otherwise, the function returns an error code.
Created subsets must be deleted with Fenix_Data_subset_delete to free memory.
| [in] | num_blocks | The number of contiguous data blocks. |
| [in] | start_offset | The index of the first element in the first data block. |
| [in] | end_offset | The index of the last element in the first data block. |
| [in] | stride | Regular shift between successive data blocks. |
| [out] | subset_specifier | The created subset. |
| int Fenix_Data_subset_createv | ( | int | num_blocks, |
| int * | array_start_offsets, | ||
| int * | array_end_offsets, | ||
| Fenix_Data_subset * | subset_specifier ) |
As Fenix_Data_subset_create, but with varying start and end offsets.
Creates a subset based on num_blocks pairs of {start_offset,end_offset}. The value of start_offset must be smaller than or equal to end_offset to indicate non-negative block size. Otherwise, the function returns an error code.
Created subsets must be deleted with Fenix_Data_subset_delete to free memory.
| [in] | num_blocks | The number of contiguous data blocks. |
| [in] | array_start_offsets | The index of the first element in each data block. |
| [in] | array_end_offsets | The index of the last element in each data block. |
| [out] | subset_specifier | The created subset. |
| int Fenix_Data_subset_delete | ( | Fenix_Data_subset * | subset_specifier | ) |
Delete a data subset.
Frees the memory associated with a data subset object.
| [in] | subset_specifier | The subset to delete. |