.. _save-load-tutorial: ######################################## Saving and Loading FIREWHEEL Experiments ######################################## This tutorial demonstrates how to save the state of a running FIREWHEEL experiment and later restore it using the :ref:`helper_save` and :ref:`helper_load` Helpers. In this tutorial, your goal is to create a known-good experiment state, make an intentional change inside a VM, save that state, then later introduce an unwanted change and use the save/load workflow to restore the experiment back to the previously saved state. This workflow is useful when you want to preserve a configured experiment for later reuse, checkpoint an experiment before trying a risky change, recover from a mistake made during manual VM interaction, or restore a saved experiment in another compatible environment after appropriate validation. By the end of this tutorial, you will have demonstrated that FIREWHEEL can restore an experiment back to a known saved point rather than forcing you to rebuild and reconfigure everything manually. ************* Prerequisites ************* Before starting, ensure that: * FIREWHEEL is installed and functioning correctly. * You have minimega version 3.0.1 or later installed, and it is configured to use absolute paths for backing images when creating snapshots (that is, with the ``MM_ABSSNAPSHOT=true`` configuration option). See :ref:`configuring-minimega` for more details. * The necessary repositories and VM images for the chosen experiment are installed. * You can access running VMs through miniweb or VNC. * The testbed is in a clean state. As with many FIREWHEEL tutorials, it is a good idea to begin by restarting the environment: .. code-block:: bash $ firewheel restart **************** What You Will Do **************** In this tutorial, you will first create and save a known-good checkpoint of a running experiment. After saving, the current experiment is paused, which gives you a natural point to either stop or resume and continue working from that state. The first diagram below shows that :ref:`helper_save` workflow. .. graphviz:: digraph save_workflow { rankdir=LR; labelloc="t"; label="FIREWHEEL Save Workflow"; fontsize=18; node [shape=box, style="rounded,filled", fillcolor="#EAF2F8", color="#4A6FA5", fontname="Helvetica"]; edge [color="#4A6FA5", penwidth=1.5]; running [label="Running\nExperiment"]; modify [label="Make and Verify\nVM Change"]; save [label="Save State\nfirewheel save"]; backup [label="Backup Directory\n(and optional .tar)"]; paused [label="Experiment Paused\nAfter Save"]; resume [label="Manual Resume\nfirewheel vm resume --all"]; continue [label="Continue Working\nfrom Saved Checkpoint"]; running -> modify; modify -> save; save -> backup; save -> paused; paused -> resume; resume -> continue; } Later in the tutorial, you will restore that saved checkpoint. By default, the :ref:`helper_load` Helper automatically resumes the restored experiment, though you can also request that it come back paused for inspection before manually resuming it. The second diagram below shows that restore workflow. .. graphviz:: digraph load_workflow { rankdir=LR; labelloc="t"; label="FIREWHEEL Load Workflow"; fontsize=18; node [shape=box, style="rounded,filled", fillcolor="#EAF2F8", color="#4A6FA5", fontname="Helvetica"]; edge [color="#4A6FA5", penwidth=1.5]; backup [label="Saved Backup\nDirectory or Archive"]; dryrun [label="Optional Validation\nfirewheel load --dry-run"]; load [label="Restore State\nfirewheel load"]; resumed [label="Restored Experiment\nAutomatically Resumed"]; paused [label="Optional Paused Restore\nfirewheel load --paused"]; resume [label="Manual Resume\nfirewheel vm resume --all"]; verify [label="Verify Saved State\nWas Restored"]; backup -> dryrun [style=dashed, label="optional"]; backup -> load; dryrun -> load; load -> resumed; load -> paused [style=dashed, label="optional"]; resumed -> verify; paused -> resume; resume -> verify; } .. note:: Before using :ref:`helper_save` and :ref:`helper_load`, keep the following operational expectations in mind: * :ref:`helper_save` pauses the currently running experiment when the save completes. To continue working in that same experiment after saving, run: .. code-block:: bash $ firewheel vm resume --all * :ref:`helper_load` requires that no FIREWHEEL experiment is currently running. In most cases, users should first reset the testbed with: .. code-block:: bash $ firewheel restart * A restore reuses existing files or directories automatically when their contents are identical to the backup. The :option:`load --force` option is only required when an existing restore destination differs from the backup. * If a restore fails after making partial changes, the recommended recovery is to reset the environment and try again: .. code-block:: bash $ firewheel restart hard ****************** Portability Status ****************** The table below summarizes the current validation status for common save/load deployment transitions. +----------------------------------------------------------+---------------------------+ | Restore path | Current status | +==========================================================+===========================+ | single-node -> single-node | tested and verified | +----------------------------------------------------------+---------------------------+ | single-node -> cluster | not yet supported | +----------------------------------------------------------+---------------------------+ | cluster -> single-node | not yet supported | +----------------------------------------------------------+---------------------------+ | cluster -> cluster (same size) | not yet supported | +----------------------------------------------------------+---------------------------+ | cluster -> cluster (different sizes) | not yet supported | +----------------------------------------------------------+---------------------------+ When restoring into any environment other than the verified single-node to single-node case, it is strongly recommended to run :option:`load --dry-run` first and carefully validate VM behavior and VM Resource handling after the restore completes. ************************ Launching an Experiment ************************ For this tutorial, we will use the :ref:`router-tree-tutorial` experiment because it is small, familiar, and provides accessible Ubuntu VMs for verification. Launch the experiment with: .. code-block:: bash $ firewheel experiment tests.router_tree:3 minimega.launch Once the experiment is running, verify that the VMs are up: .. code-block:: bash $ firewheel vm mix VM Mix ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓ ┃ VM Image ┃ Power State ┃ VM Resource State ┃ Count ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩ │ ubuntu-16.04.4-server-amd64.qcow2 │ RUNNING │ configured │ 4 │ ├───────────────────────────────────┼─────────────┼───────────────────┼───────┤ │ vyos-1.1.8.qc2 │ RUNNING │ configured │ 8 │ ├───────────────────────────────────┼─────────────┼───────────────────┼───────┤ │ │ │ Total Scheduled │ 12 │ └───────────────────────────────────┴─────────────┴───────────────────┴───────┘ You should see a mixture of Ubuntu and VyOS VMs in the experiment. ************************************** Connecting to a VM and Making a Change ************************************** Now connect to one of the Ubuntu VMs. For this tutorial, we will use ``host.root.net``. You can connect using miniweb or VNC as described in :ref:`router-tree-miniweb`. Once logged in, create a marker file that will be easy to verify later: .. code-block:: bash $ echo "saved-state-marker" > state_marker.txt Now verify that the file exists: .. code-block:: bash $ cat state_marker.txt You should see: .. code-block:: text saved-state-marker This file represents a useful change that you want to preserve. .. image:: images/save_load_marker_created.png :alt: Marker file created inside the VM ********************* Saving the Experiment ********************* Now that the VM contains a known-good change, save the experiment. For example: .. code-block:: bash $ firewheel save --name router_tree_saved_state ────────────────────────────────────── Phase 1: Save Namespace ────────────────────────────────────── Waiting for namespace save to complete... (fw-node: 12/12) ━━━━━━━━━━━━━━━━━━━ 12/12 0:00:00 ✓ Namespace saved successfully ✓ Final ns save host status recorded ────────────────────────────────── Phase 2: Collect Restore Data ─────────────────────────────────── ✓ Saved minimega tap commands (e.g., a control network) ✓ Saved VM mapping ✓ Saved experiment time Copying schedule files... ━━━━━━━━━━━━━━━━━━━ 12/12 0:00:00 ✓ Pruned and saved schedule files (12) ✓ Copied VM resource handler launch file ✓ Wrote manifest metadata ────────────────────────────────────────── Save Complete ─────────────────────────────────────────── ✓ Experiment save completed successfully Saved Backup Experiment name router_tree_saved_state Backup directory /scratch/minimega/files/saved/router_tree_saved_state Schedule files 12 launch_cmds.mm Included ImageStore cache Not included VmResourceStore cache Not included Archive Not created Next step: Restore this backup later with firewheel load /scratch/minimega/files/saved/router_tree_saved_state or use firewheel vm resume --all to resume the current experiment. This writes a backup directory in the minimega files store. In this example it is: .. code-block:: text /scratch/minimega/files/saved/router_tree_saved_state If you would also like a tar archive, you can instead use: .. code-block:: bash $ firewheel save --name router_tree_saved_state --archive .. note:: The :option:`save --archive` option currently creates an uncompressed ``.tar`` archive. The :ref:`helper_load` Helper can restore from ``.tar``, ``.tar.gz``, or ``.tgz`` files. For large experiments, if you want a compressed archive for transfer or storage, it is generally better to compress the resulting tarball afterward using external tools. Highly parallel compression tools such as ``pigz`` are often a good choice for large backups. For example, to compress using all available CPU cores while keeping the original tarball: .. code-block:: bash $ firewheel save --name my_experiment --archive $ pigz -k -p "$(nproc)" my_experiment_backup.tar This produces ``my_experiment_backup.tar.gz``, which can later be restored with :ref:`helper_load`. If you want to include the backing images and VM resources cache content, use: .. code-block:: bash $ firewheel save --name router_tree_saved_state --complete --archive At this point, FIREWHEEL has saved the entire experiment state. ****************************** Introducing an Unwanted Change ****************************** At this point, you have saved a known-good checkpoint of the experiment. As part of the save process, the experiment is paused so that you can either preserve that saved state and stop working, or intentionally continue working from the current experiment as a new "fork" of that state. In practice, after saving, you now have two choices: #. Reset the testbed and later restore the saved checkpoint with :ref:`helper_load`. #. Resume the currently running experiment and continue making additional changes. For this tutorial, we will choose the second option so that we can intentionally move the running experiment away from the saved state and later prove that :ref:`helper_load` restores the earlier checkpoint. Resume the experiment with: .. code-block:: bash $ firewheel vm resume --all Resumed VM Resource Handling for 12 VMs. Now return to ``host.root.net`` and delete the saved marker file: .. code-block:: bash $ rm -f state_marker.txt Then create a different file indicating that the VM is now in an unwanted state: .. code-block:: bash $ echo "bad-state-marker" > bad_marker.txt Verify that the original saved marker is gone and the unwanted marker exists: .. code-block:: bash $ ls *marker.txt You should see only ``bad_marker.txt``. At this point, the running experiment no longer matches the saved checkpoint. This is exactly the kind of situation where save/load is useful: you made additional changes after saving, decided you do not want to keep them, and now want to return the experiment to the previously saved state. .. image:: images/bad_marker_created.png :alt: A bad file marker created inside the VM ********************* Resetting the Testbed ********************* Before using :ref:`helper_load`, the testbed must not already be running another FIREWHEEL experiment. Reset the environment: .. code-block:: bash $ firewheel restart *********************** Loading the Saved State *********************** Now that the testbed is cleared, we want to load our previously saved state. First we will validate the backup before performing the actual restore. If you saved a directory, you can provide either the full path to that directory or just the saved experiment name. If only the name is provided, :ref:`helper_load` will look for it in the minimega saved files directory. For example: .. code-block:: bash $ firewheel load router_tree_saved_state --dry-run ─────────────────────────────────── Phase 1: Read Backup Source ──────────────────────────────────── Source: /scratch/minimega/files/saved/router_tree_saved_state ✓ Using existing backup directory ───────────────────────────────────── Phase 2: Validate Backup ───────────────────────────────────── Validated Backup Root directory /scratch/minimega/files/saved/router_tree_saved_state Experiment name router_tree_saved_state FIREWHEEL version 2.11.1.dev13 Format version 1 Created at 2026-04-30T18:03:10.009534+00:00 Schedule count 12 Has launch_cmds.mm True Has ImageStore cache False Has VmResourceStore cache False ✓ Backup validated ✓ No active FIREWHEEL experiment is running ✓ Restore destinations validated ───────────────────────────────────────── Dry Run Summary ────────────────────────────────────────── ✓ Dry run completed successfully Planned Restore Experiment router_tree_saved_state Saved VM files /scratch/minimega/files/saved/router_tree_saved_state VM mapping /scratch/minimega/files/saved/router_tree_saved_state/vm_mapping.json Schedules /scratch/minimega/files/saved/router_tree_saved_state/schedules Launch VMs via /scratch/minimega/files/saved/router_tree_saved_state/launch.mm Launch handlers via /scratch/minimega/files/saved/router_tree_saved_state/launch_cmds.mm ImageStore cache Not present VmResourceStore cache Not present Experiment time Would restore last ↺ Existing identical files/directories would be reused without overwrite ✓ No changes were made This dry run gives you a chance to confirm that the restore is likely to work before FIREWHEEL makes any changes to the testbed. In particular, it checks that the backup layout and manifest are valid, that the restore targets are suitable, and that the restore could proceed successfully in the current environment. This is especially helpful when working with an older backup or when restoring into an environment where some files may already exist. After confirming that the dry run succeeds, perform the actual restore. If you saved a directory: .. code-block:: bash $ firewheel load router_tree_saved_state ─────────────────────────────────── Phase 1: Read Backup Source ──────────────────────────────────── Source: /scratch/minimega/files/saved/router_tree_saved_state ✓ Using existing backup directory ───────────────────────────────────── Phase 2: Validate Backup ───────────────────────────────────── Validated Backup Root directory /scratch/minimega/files/saved/router_tree_saved_state Experiment name router_tree_saved_state FIREWHEEL version 2.11.1.dev13 Format version 1 Created at 2026-04-30T18:03:10.009534+00:00 Schedule count 12 Has launch_cmds.mm True Has ImageStore cache False Has VmResourceStore cache False ✓ Backup validated ✓ No active FIREWHEEL experiment is running ✓ Restore destinations validated ────────────────────────────────────── Phase 3: Restore Data ─────────────────────────────────────── ↺ Reused existing saved VM files ✓ Restored VM mapping (12 entries) ✓ Restored schedules (12 files) ─────────────────────────────────────── Phase 4: Launch VMs ──────────────────────────────────────── ✓ Started saved VMs ─────────────────────────────────── Phase 5: Restore Experiment Time ─────────────────────────────── ✓ Restored experiment time ─────────────────────────────── Phase 6: Launch VM Resource Handlers ─────────────────────────────── ✓ Rebuilt VM resource handler socket paths for 12 VMs ✓ Started VM resource handlers (12 processes launched) ✓ Restored schedules and resumed VM Resource handling automatically ───────────────────────────────────────── Restore Complete ───────────────────────────────────────── ✓ Experiment restore completed successfully Restore Result Experiment router_tree_saved_state Saved VM path /scratch/minimega/files/saved/router_tree_saved_state Saved VM files Reused VM mapping entries 12 Schedules 12 copied / 0 reused ImageStore cache Not present VmResourceStore cache Not present VMs launched Yes VM handlers launched Yes (12 processes) Experiment time Restored .. note:: When using ``firewheel load --paused``, resume the experiment manually when ready with: .. code-block:: bash $ firewheel vm resume --all The :ref:`helper_load` Helper validates the backup, restores the saved VM files and metadata, relaunches the experiment, restores schedules, and rebuilds VM Resource handler socket paths if necessary. **************************** Verifying the Restored State **************************** Once the restored experiment is running, reconnect to ``host.root.net`` and check the marker files. First, verify that the saved marker file has been restored: .. code-block:: bash $ cat state_marker.txt You should again see: .. code-block:: text saved-state-marker Next, verify that the later unwanted change is gone: .. code-block:: bash $ ls bad_marker.txt This file should no longer exist. This confirms that the experiment was successfully restored to the previously saved checkpoint rather than preserving the later unwanted change. .. image:: images/save_load_marker_restored.png :alt: Restored marker file inside the VM