Saving and Loading FIREWHEEL Experiments

This tutorial demonstrates how to save the state of a running FIREWHEEL experiment and later restore it using the save and load Helpers.

In this tutorial, your goal is to create a known-good experiment state, make an intentional change inside a VM, save that state, then later introduce an unwanted change and use the save/load workflow to restore the experiment back to the previously saved state.

This workflow is useful when you want to preserve a configured experiment for later reuse, checkpoint an experiment before trying a risky change, recover from a mistake made during manual VM interaction, or restore a saved experiment in another compatible environment after appropriate validation.

By the end of this tutorial, you will have demonstrated that FIREWHEEL can restore an experiment back to a known saved point rather than forcing you to rebuild and reconfigure everything manually.

Prerequisites

Before starting, ensure that:

  • FIREWHEEL is installed and functioning correctly.

  • You have minimega version 3.0.1 or later installed, and it is configured to use absolute paths for backing images when creating snapshots (that is, with the MM_ABSSNAPSHOT=true configuration option). See Configuring minimega for more details.

  • The necessary repositories and VM images for the chosen experiment are installed.

  • You can access running VMs through miniweb or VNC.

  • The testbed is in a clean state.

As with many FIREWHEEL tutorials, it is a good idea to begin by restarting the environment:

$ firewheel restart

What You Will Do

In this tutorial, you will first create and save a known-good checkpoint of a running experiment. After saving, the current experiment is paused, which gives you a natural point to either stop or resume and continue working from that state.

The first diagram below shows that save workflow.

digraph save_workflow {
    rankdir=LR;
    labelloc="t";
    label="FIREWHEEL Save Workflow";
    fontsize=18;

    node [shape=box, style="rounded,filled", fillcolor="#EAF2F8", color="#4A6FA5", fontname="Helvetica"];
    edge [color="#4A6FA5", penwidth=1.5];

    running [label="Running\nExperiment"];
    modify [label="Make and Verify\nVM Change"];
    save [label="Save State\nfirewheel save"];
    backup [label="Backup Directory\n(and optional .tar)"];
    paused [label="Experiment Paused\nAfter Save"];
    resume [label="Manual Resume\nfirewheel vm resume --all"];
    continue [label="Continue Working\nfrom Saved Checkpoint"];

    running -> modify;
    modify -> save;
    save -> backup;
    save -> paused;
    paused -> resume;
    resume -> continue;
}

Later in the tutorial, you will restore that saved checkpoint. By default, the load Helper automatically resumes the restored experiment, though you can also request that it come back paused for inspection before manually resuming it.

The second diagram below shows that restore workflow.

digraph load_workflow {
    rankdir=LR;
    labelloc="t";
    label="FIREWHEEL Load Workflow";
    fontsize=18;

    node [shape=box, style="rounded,filled", fillcolor="#EAF2F8", color="#4A6FA5", fontname="Helvetica"];
    edge [color="#4A6FA5", penwidth=1.5];

    backup [label="Saved Backup\nDirectory or Archive"];
    dryrun [label="Optional Validation\nfirewheel load --dry-run"];
    load [label="Restore State\nfirewheel load"];
    resumed [label="Restored Experiment\nAutomatically Resumed"];
    paused [label="Optional Paused Restore\nfirewheel load --paused"];
    resume [label="Manual Resume\nfirewheel vm resume --all"];
    verify [label="Verify Saved State\nWas Restored"];

    backup -> dryrun [style=dashed, label="optional"];
    backup -> load;
    dryrun -> load;
    load -> resumed;
    load -> paused [style=dashed, label="optional"];
    resumed -> verify;
    paused -> resume;
    resume -> verify;
}

Note

Before using save and load, keep the following operational expectations in mind:

  • save pauses the currently running experiment when the save completes. To continue working in that same experiment after saving, run:

    $ firewheel vm resume --all
    
  • load requires that no FIREWHEEL experiment is currently running. In most cases, users should first reset the testbed with:

    $ firewheel restart
    
  • A restore reuses existing files or directories automatically when their contents are identical to the backup. The load --force option is only required when an existing restore destination differs from the backup.

  • If a restore fails after making partial changes, the recommended recovery is to reset the environment and try again:

    $ firewheel restart hard
    

Portability Status

The table below summarizes the current validation status for common save/load deployment transitions.

Restore path

Current status

single-node -> single-node

tested and verified

single-node -> cluster

not yet supported

cluster -> single-node

not yet supported

cluster -> cluster (same size)

not yet supported

cluster -> cluster (different sizes)

not yet supported

When restoring into any environment other than the verified single-node to single-node case, it is strongly recommended to run load --dry-run first and carefully validate VM behavior and VM Resource handling after the restore completes.

Launching an Experiment

For this tutorial, we will use the Running the Router Tree Topology experiment because it is small, familiar, and provides accessible Ubuntu VMs for verification.

Launch the experiment with:

$ firewheel experiment tests.router_tree:3 minimega.launch

Once the experiment is running, verify that the VMs are up:

$ firewheel vm mix
                                        VM Mix
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ VM Image                           Power State  VM Resource State  Count ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ ubuntu-16.04.4-server-amd64.qcow2  RUNNING      configured         4     │
├───────────────────────────────────┼─────────────┼───────────────────┼───────┤
│ vyos-1.1.8.qc2                     RUNNING      configured         8     │
├───────────────────────────────────┼─────────────┼───────────────────┼───────┤
│                                                 Total Scheduled    12    │
└───────────────────────────────────┴─────────────┴───────────────────┴───────┘

You should see a mixture of Ubuntu and VyOS VMs in the experiment.

Connecting to a VM and Making a Change

Now connect to one of the Ubuntu VMs. For this tutorial, we will use host.root.net. You can connect using miniweb or VNC as described in Using miniweb. Once logged in, create a marker file that will be easy to verify later:

$ echo "saved-state-marker" > state_marker.txt

Now verify that the file exists:

$ cat state_marker.txt

You should see:

saved-state-marker

This file represents a useful change that you want to preserve.

Marker file created inside the VM

Saving the Experiment

Now that the VM contains a known-good change, save the experiment.

For example:

$ firewheel save --name router_tree_saved_state
────────────────────────────────────── Phase 1: Save Namespace ──────────────────────────────────────
Waiting for namespace save to complete... (fw-node: 12/12) ━━━━━━━━━━━━━━━━━━━ 12/12 0:00:00
✓ Namespace saved successfully
✓ Final ns save host status recorded
────────────────────────────────── Phase 2: Collect Restore Data ───────────────────────────────────
✓ Saved minimega tap commands (e.g., a control network) Saved VM mapping
✓ Saved experiment time
Copying schedule files... ━━━━━━━━━━━━━━━━━━━ 12/12 0:00:00
✓ Pruned and saved schedule files (12) Copied VM resource handler launch file
✓ Wrote manifest metadata
────────────────────────────────────────── Save Complete ───────────────────────────────────────────
✓ Experiment save completed successfully
  Saved Backup
  Experiment name            router_tree_saved_state
  Backup directory           /scratch/minimega/files/saved/router_tree_saved_state
  Schedule files             12
  launch_cmds.mm             Included
  ImageStore cache           Not included
  VmResourceStore cache      Not included
  Archive                    Not created
Next step: Restore this backup later with firewheel load /scratch/minimega/files/saved/router_tree_saved_state
or use firewheel vm resume --all to resume the current experiment.

This writes a backup directory in the minimega files store. In this example it is:

/scratch/minimega/files/saved/router_tree_saved_state

If you would also like a tar archive, you can instead use:

$ firewheel save --name router_tree_saved_state --archive

Note

The save --archive option currently creates an uncompressed .tar archive. The load Helper can restore from .tar, .tar.gz, or .tgz files.

For large experiments, if you want a compressed archive for transfer or storage, it is generally better to compress the resulting tarball afterward using external tools. Highly parallel compression tools such as pigz are often a good choice for large backups.

For example, to compress using all available CPU cores while keeping the original tarball:

$ firewheel save --name my_experiment --archive
$ pigz -k -p "$(nproc)" my_experiment_backup.tar

This produces my_experiment_backup.tar.gz, which can later be restored with load.

If you want to include the backing images and VM resources cache content, use:

$ firewheel save --name router_tree_saved_state --complete --archive

At this point, FIREWHEEL has saved the entire experiment state.

Introducing an Unwanted Change

At this point, you have saved a known-good checkpoint of the experiment. As part of the save process, the experiment is paused so that you can either preserve that saved state and stop working, or intentionally continue working from the current experiment as a new “fork” of that state. In practice, after saving, you now have two choices:

  1. Reset the testbed and later restore the saved checkpoint with load.

  2. Resume the currently running experiment and continue making additional changes.

For this tutorial, we will choose the second option so that we can intentionally move the running experiment away from the saved state and later prove that load restores the earlier checkpoint. Resume the experiment with:

$ firewheel vm resume --all
Resumed VM Resource Handling for 12 VMs.

Now return to host.root.net and delete the saved marker file:

$ rm -f state_marker.txt

Then create a different file indicating that the VM is now in an unwanted state:

$ echo "bad-state-marker" > bad_marker.txt

Verify that the original saved marker is gone and the unwanted marker exists:

$ ls *marker.txt

You should see only bad_marker.txt.

At this point, the running experiment no longer matches the saved checkpoint. This is exactly the kind of situation where save/load is useful: you made additional changes after saving, decided you do not want to keep them, and now want to return the experiment to the previously saved state.

A bad file marker created inside the VM

Resetting the Testbed

Before using load, the testbed must not already be running another FIREWHEEL experiment. Reset the environment:

$ firewheel restart

Loading the Saved State

Now that the testbed is cleared, we want to load our previously saved state. First we will validate the backup before performing the actual restore. If you saved a directory, you can provide either the full path to that directory or just the saved experiment name. If only the name is provided, load will look for it in the minimega saved files directory. For example:

$ firewheel load router_tree_saved_state --dry-run
─────────────────────────────────── Phase 1: Read Backup Source ────────────────────────────────────
Source: /scratch/minimega/files/saved/router_tree_saved_state
✓ Using existing backup directory
───────────────────────────────────── Phase 2: Validate Backup ─────────────────────────────────────
Validated Backup
Root directory             /scratch/minimega/files/saved/router_tree_saved_state
Experiment name            router_tree_saved_state
FIREWHEEL version          2.11.1.dev13
Format version             1
Created at                 2026-04-30T18:03:10.009534+00:00
Schedule count             12
Has launch_cmds.mm         True
Has ImageStore cache       False
Has VmResourceStore cache  False
✓ Backup validated
✓ No active FIREWHEEL experiment is running
✓ Restore destinations validated
───────────────────────────────────────── Dry Run Summary ──────────────────────────────────────────
✓ Dry run completed successfully
Planned Restore
Experiment                 router_tree_saved_state
Saved VM files             /scratch/minimega/files/saved/router_tree_saved_state
VM mapping                 /scratch/minimega/files/saved/router_tree_saved_state/vm_mapping.json
Schedules                  /scratch/minimega/files/saved/router_tree_saved_state/schedules
Launch VMs via             /scratch/minimega/files/saved/router_tree_saved_state/launch.mm
Launch handlers via        /scratch/minimega/files/saved/router_tree_saved_state/launch_cmds.mm
ImageStore cache           Not present
VmResourceStore cache      Not present
Experiment time            Would restore last
↺ Existing identical files/directories would be reused without overwrite
✓ No changes were made

This dry run gives you a chance to confirm that the restore is likely to work before FIREWHEEL makes any changes to the testbed. In particular, it checks that the backup layout and manifest are valid, that the restore targets are suitable, and that the restore could proceed successfully in the current environment. This is especially helpful when working with an older backup or when restoring into an environment where some files may already exist.

After confirming that the dry run succeeds, perform the actual restore. If you saved a directory:

$ firewheel load router_tree_saved_state
─────────────────────────────────── Phase 1: Read Backup Source ────────────────────────────────────
Source: /scratch/minimega/files/saved/router_tree_saved_state
✓ Using existing backup directory
───────────────────────────────────── Phase 2: Validate Backup ─────────────────────────────────────
Validated Backup
Root directory             /scratch/minimega/files/saved/router_tree_saved_state
Experiment name            router_tree_saved_state
FIREWHEEL version          2.11.1.dev13
Format version             1
Created at                 2026-04-30T18:03:10.009534+00:00
Schedule count             12
Has launch_cmds.mm         True
Has ImageStore cache       False
Has VmResourceStore cache  False
✓ Backup validated
✓ No active FIREWHEEL experiment is running
✓ Restore destinations validated
────────────────────────────────────── Phase 3: Restore Data ───────────────────────────────────────
↺ Reused existing saved VM files
✓ Restored VM mapping (12 entries) Restored schedules (12 files)
─────────────────────────────────────── Phase 4: Launch VMs ────────────────────────────────────────
✓ Started saved VMs
─────────────────────────────────── Phase 5: Restore Experiment Time ───────────────────────────────
✓ Restored experiment time
─────────────────────────────── Phase 6: Launch VM Resource Handlers ───────────────────────────────
✓ Rebuilt VM resource handler socket paths for 12 VMs
✓ Started VM resource handlers (12 processes launched) Restored schedules and resumed VM Resource handling automatically
───────────────────────────────────────── Restore Complete ─────────────────────────────────────────
✓ Experiment restore completed successfully
Restore Result
Experiment                 router_tree_saved_state
Saved VM path              /scratch/minimega/files/saved/router_tree_saved_state
Saved VM files             Reused
VM mapping entries         12
Schedules                  12 copied / 0 reused
ImageStore cache           Not present
VmResourceStore cache      Not present
VMs launched               Yes
VM handlers launched       Yes (12 processes)
Experiment time            Restored

Note

When using firewheel load --paused, resume the experiment manually when ready with:

$ firewheel vm resume --all

The load Helper validates the backup, restores the saved VM files and metadata, relaunches the experiment, restores schedules, and rebuilds VM Resource handler socket paths if necessary.

Verifying the Restored State

Once the restored experiment is running, reconnect to host.root.net and check the marker files. First, verify that the saved marker file has been restored:

$ cat state_marker.txt

You should again see:

saved-state-marker

Next, verify that the later unwanted change is gone:

$ ls bad_marker.txt

This file should no longer exist. This confirms that the experiment was successfully restored to the previously saved checkpoint rather than preserving the later unwanted change.

Restored marker file inside the VM