Model Component INSTALL file

Some Model Components may require additional Python packages to be installed within FIREWHEEL’s virtual environment or for data to be downloaded. In this case, the Model Component can have an INSTALL directory, which contains a valid Ansible Playbook (recommended method). Alternatively, INSTALL can be an executable script (as denoted by a shebang line), though this is not recommended and support will be removed in a future releases. When users use the mc generate Helper, a new INSTALL directory is created with sample tasks.yml and vars.yml automatically included.

When a repository is installed via the repository install Helper, users have the option to automatically run each MC’s INSTALL script using the -s flag (see repository install for more details). Alternatively, if an uninstalled model component is used in a firewheel experiment, then it will prompt the user to install the model component.

Design Principles

We recommend that the following principles are adhered to when installing a model component.

  1. Idempotence – The file(s) should be capable of running multiple times without causing issues. This is a core tenant of Ansible and a strong motivator why Ansible playbooks are the preferred method.

  2. Reproducibility – It is critical that users download the exact same data that was originally intended by the Model Component creators. If the data/packages differ, then there is a strong possibility that the experimental outcomes will differ and could produce unintended consequences. Therefore, we strongly recommend that MC creators link to exact versions of software to download, rather than an automatically updating link. For example, if the MC was supposed to install a GitLab runner:

    # BAD: This will automatically get the latest URL.
    wget https://gitlab-runner-downloads.s3.amazonaws.com/latest/binaries/gitlab-runner-linux-amd64
    
    # GOOD: Get version 11.4.2
    wget https://s3.amazonaws.com/gitlab-runner-downloads/v11.4.2/binaries/gitlab-runner-linux-386
    
  3. Integrity – A checksum for all downloaded files is strongly recommend both to facilitate reproducibility and to increase the security of the experiment.

  4. Offline Accessible – Many experiments are conducted on infrastructure that lacks Internet access. Therefore, we recommend that INSTALL files allow users to achieve the same end result using cached files.

  5. Cleanup – Only the essential dependencies should be kept and any irrelevant data that may have been generated during intermediate steps should be removed.

  6. Readability – Users will need to execute these potentially unknown actions, the INSTALL script should be well documented and readable to the average user. Readability is desired over brevity.

Ansible INSTALL Directory Requirements

The expected INSTALL directory structure is:

MC_DIR
└── INSTALL
    ├── tasks.yml
    └── vars.yml

Where tasks.yml is a YAML list of Ansible tasks that will be included when installing the model component. The vars.yml file should be a YAML dictionary of all the variable keys/values which will be used when installing the model component.

tasks.yml

The tasks file should be a YAML list with any tasks needed to ensure that the model component can execute correctly as intended.

Listing 2 This is an example tasks.yml file that collects, verifies, and compresses needed binaries.
 - name: Create directory for htop
   ansible.builtin.file:
     path: "htop-1_0_2_debs"
     state: directory

 - name: Download htop Package
   ansible.builtin.get_url:
     url: "http://archive.ubuntu.com/ubuntu/pool/universe/h/htop/htop_1.0.2-3_amd64.deb"
     dest: "htop-1_0_2_debs/htop_1.0.2-3_amd64.deb"
     checksum: "sha256:0311d8a26689935ca53e8e9252cb2d95a1fdc2f8278f4edb5733f555dad984a9"

 - name: Create tarball of htop directory
   ansible.builtin.archive:
     path: "htop-1_0_2_debs"
     dest: "htop-1_0_2_debs.tar.gz"
     format: gz

 - name: Move tarball to vm_resources/debs/
   ansible.builtin.copy:
     src: "htop-1_0_2_debs.tar.gz"
     dest: "{{ mc_dir }}/vm_resources/debs/htop-1_0_2_debs.tgz"

 - name: Remove htop directory
   ansible.builtin.file:
     path: "htop-1_0_2_debs"
     state: absent

vars.yml

The vars.yml file should be a YAML dictionary of all the variable keys/values which will be used when installing the model component. FIREWHEEL will automatically provide the following variables to the Ansible playbooks when running:

  • mc_name – The name of the Model Component.

  • mc_dir – The full path to the model component directory.

In addition to any variables the specific tasks need, the vars.yml should have a required_files key where a list of the final output files is listed. This is because the model component installation is assumed to be complete when all required_files are present. As an added benefit, FIREWHEEL supports caching pre-computed blobs from various resources to enable offline experiment access and the required_files supports this feature. The process of collecting offline required files is automatically handled by FIREWHEEL and using this process is discussed in detail in Setting up an Offline Cache. If no required_files are needed, then it can be omitted from INSTALL/vars.yml.

Continuing the example from above, the end result of tasks.yml is the creation of the file {{ mc_dir }}/vm_resources/debs/htop-1_0_2_debs.tgz. Therefore, this file is required to exist for the model component to be completely installed. The vars.yml file would look like:

Listing 3 This is an example vars.yml file that ensures the final MC state.
required_files:
  - destination: "{{ mc_dir }}/vm_resources/debs/htop-1_0_2_debs.tgz"

The full definition for required_files is:

destination

Where the file should be placed. Should include {{ mc_dir }} if the file needs to be relative to the model component directory.

Type:

string

Required:

true

source

Where the file should be located within the cache. This should not be set by MC creators, as it defaults to {{ mc_name }}/file. However, it is available to be modified by end-users if desired.

Type:

string

Required:

false

Default:

{{ mc_name }}/file

checksum_algorithm

Algorithm to determine checksum of file. Must be supported by ansible.builtin.stat (e.g, "sha1", "sha256", etc.).

Type:

string

Required:

false

checksum

The hash of the file.

Warning

While having a reproducible build process (and a stable checksum/hash) is ideal for cyber experimentation, there are many challenges to achieving this reality. Notably, many archive tools include metadata, such as timestamps, that make it difficult to create an identical checksum each time. Unless these issues have been addressed within tasks.yml, this field should be avoided. For more information, see: https://reproducible-builds.org/.

Type:

string

Required:

false

Setting up an Offline Cache

Collecting and retrieving files from a cache is automatically supported in Ansible playbooks without MC designer intervention. Currently, FIREWHEEL supports caching files in a file server, git repository, or in an Amazon S3 data store. If the user sets the necessary settings in the FIREWHEEL Configuration for the described types below, then FIREWHEEL will automatically check those locations for any model component required_files. Users are able to set multiple cache types as FIREWHEEL will check any caches for the required file.

Users setting up a cache should place cached files using the path: {{ mc_name }}/{{ item.destination | basename }}. From the example above, the default source path would be linux.ubuntu/htop-1_0_2_debs.tgz, where linux.ubuntu is the name of the associated model component. Users can optionally modify this path by setting the source within the model component variables file.

Git Cache

Users can use git repositories for caching model component binaries. To use this, users will need to install git and git-lfs on their Control Node to appropriately clone their repositories. This caching mechanism is set up so that repositories are initially cloned without downloading any large file storage (LFS) for performance reasons. Then if a required_file is identified without that repository, the file is subsequently downloaded and moved into place.

Note

Users may also need to execute git lfs install to set up Git LFS for their user account.

If users plan to use git repositories for the Model Component cache, they should specify the following options in the FIREWHEEL Configuration under the ansible key.

An example of this configuration is shown below:

Listing 4 An example of an Ansible git server portion of the FIREWHEEL Configuration.
ansible:
  git_servers:
    - server_url: "https://github.com"
      repositories:
        - path: "firewheel/mc_repo1"
        - path: "firewheel/mc_repo2"
          branch: "develop"
    - server_url: "ssh://git@gitlab.com"
      repositories:
        - path: "emulytics/firewheel/mc_repo3"
          branch: "feature-branch"
    - server_url: "https://user:ACCESS-TOKEN@github.com/"
      repositories:
        - path: "firewheel/mc_repo4"
git_servers

A list of dictionaries containing configuration options for multiple Git servers.

Type:

list

Required:

true

Each dictionary should contain the following keys:

server_url

The full URL of the git server (e.g., "https://github.com").

Type:

string

Required:

true

Note

If an access token is being used, the user can specify it as part of the URL. For example: https://user:ACCESS-TOKEN@github.com/user/repo.git

repositories
Type:

list

Required:

true

A list of repositories associated with the Git server. Each repository is represented as a dictionary containing the following keys:

path

The path to the git repository containing the cached files. SCP-style URLs are not supported. When using the ssh:// protocol, please use the following format: ssh://username@example.com.

Type:

string

Required:

true

branch

The version of the repository to check out. This can be the literal string HEAD, a branch name, or a tag name. This is passed to ansible.builtin.git.

Type:

string

Required:

false

Default:

"HEAD"

S3 Cache

Users can use Amazon Simple Storage Service (S3) buckets for caching model component binaries. To use this, users will need to install boto3, the official Amazon Web Services (AWS) Software Development Kit (SDK) for Python into their FIREWHEEL virtual environment. Additionally, if users plan to use an AWS S3 instance for the Model Component cache, they should specify the following options in the FIREWHEEL Configuration under the ansible key.

An example of this configuration is shown below:

Listing 5 An example of an Ansible S3 portion of the FIREWHEEL Configuration.
ansible:
  s3_endpoints:
    - s3_endpoint: "https://s3.us-east-1.amazonaws.com"
      aws_access_key_id: "AKIAIOSFODNN7EXAMPLE"
      aws_secret_access_key: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
      buckets:
        - "firewheel_bucket1"
        - "firewheel_bucket2"
    - s3_endpoint: "https://custom-s3-endpoint:8000"
      aws_access_key_id: "AJIAIOSFODNN7EXAMPLE"
      aws_secret_access_key: "wKalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
      buckets:
        - "firewheel_bucket3"
s3_endpoints

A list of dictionaries containing configuration options for multiple S3 endpoints.

Type:

list

Required:

true

Each dictionary should contain the following keys:

s3_endpoint

The full URL of the S3 instance (e.g., "s3.amazonaws.com").

Type:

string

Required:

true

aws_access_key_id

The AWS access key (e.g., "AKIAIOSFODNN7EXAMPLE").

Type:

string

Required:

true

aws_secret_access_key

The AWS secret key (e.g., "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY").

Type:

string

Required:

true

s3_buckets

A list of buckets associated with the S3 server where each bucket is represented as a string.

Type:

list

Required:

true

File Server Cache

If users plan to use a file server (HTTP/HTTPS/FTP) for the Model Component cache, they can specify the following options in the FIREWHEEL Configuration under the ansible key.

An example of this configuration is shown below:

Listing 6 An example of an Ansible file server portion of the FIREWHEEL Configuration.
ansible:
  file_servers:
    - url: "http://example.com"
      cache_paths:
        - "path/to/location"
        - "path/to/other/location"
    - url: "http://secondexample.com"
      use_proxy: True
      validate_certs: False
      cache_paths:
        - "secondpath/to/file"
file_servers

A list of dictionaries containing configuration options for multiple file servers.

Type:

list

Required:

true

Each dictionary should contain the following keys:

url

The URL of the server hosting the cached files.

Type:

string

Required:

true

Note

If you are using an username or password token, you can specify it in the URL. For example: https://user:password@server.com

cache_paths

A list of intermediate paths to the FIREWHEEL cache. For example in the URL http://example.com/files/firewheel/firewheel_repo_linux/linux.ubuntu/htop-1_0_2_debs.tgz then url="http://example.com", url_cache_path="files/firewheel/firewheel_repo_linux", and the source=linux.ubuntu/htop-1_0_2_debs.tgz. If no cache path is required, please use a list with empty string entry as the value.

file_servers:
  - url: "http://example.com"
    cache_paths:
      - ""
Type:

list

Required:

true

use_proxy

If false, it will not use a proxy, even if one is defined in an environment variable on the target hosts.

Type:

boolean

Required:

false

Default:

true

validate_certs

If false, SSL certificates will not be validated.

Type:

boolean

Required:

false

Default:

true

Script INSTALL File Requirements

Warning

This method is NOT recommended and will be eliminated in future releases of FIREWHEEL.

If the model component needs to use a single executable to install additional Model Component, users must create a single file called: INSTALL that should not have an extension and contains a shebang line (e.g., #!/bin/bash). Additionally, users must ensure that, upon successful installation, a new file is created in the model component directory with the following format: .<MC Name>.installed. For example, if the model component name is dns.dns_objects than the new file would be .dns.dns_objects.installed.

A Bash-based INSTALL template
Listing 7 This is an example INSTALL file using bash scripting. By replacing {{mc_name}} with the model component name, users can modify this example.
#!/bin/bash

#######################################################
# This is a sample install file for {{mc_name}}.
# This file can be used to perform one-time actions
# which help prepare the model component for use.
#
# Common uses of INSTALL files include downloading
# VM Resources from the Internet and installing new
# Python packages into FIREWHEEL's virtual environment.
#
# NOTE: When you are creating these files, it is
# imperative that specific versions of software are
# used. Without being as specific as possible,
# experimental results will **NOT** be repeatable.
# We strongly recommend that any changes to software
# versions are accompanied by a warning and new model
# component version.
#######################################################

# Create a flag for verifying installation
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
INSTALL_FLAG=$SCRIPT_DIR/.{{mc_name}}.installed

#######################################################
# Checking if there this script has already been complete.
#######################################################
function check_flag() {
    if [[ -f "$INSTALL_FLAG" ]]; then
        echo >&2 "{{mc_name}} is already installed!"
        exit 117;  # Structure needs cleaning
    fi
}


#######################################################
# Install python packages into the virtual environment
# used by FIREWHEEL. This takes in an array of packages.
#######################################################
function install_python_package() {
    pkgs=("$@")
    for i in "${pkgs[@]}";
    do
        python -m pip install "$i"
    done
}


#######################################################
# Download using wget and then checksum the downloaded files.
#
# It is important to verify that the downloaded files
# are the files are the same ones as expected.
# This function provides an outline of how to checksum files,
# but will need to be updated with the specific hashes/file names
# that have been downloaded.
#
# This function assumes that the passed in hashes are SHA-256
#######################################################
function wget_and_checksum() {
    downloads=("$@")
    # Uses 2D arrays in bash: https://stackoverflow.com/a/44831174
    declare -n d
    for d in "${downloads[@]}";
    do
        wget "${d[0]}"
        echo "${d[1]}  ${d[2]}" | shasum -a 256 --check || return 1
    done
}


#######################################################
# A function to help users clean up a partial installation
# in the event of an error.
#######################################################
function cleanup() {
    echo "Cleaning up {{mc_name}} install"
    # TODO: Cleanup any downloaded files
    # rm -rf file.tar
    rm -rf $INSTALL_FLAG
    exit 1
}
trap cleanup ERR

# Start to run the script

# Ensure we only complete the script once
check_flag

#######################################################
# Uncomment if there are Pip packages to install
# `pip_packages` should be space separated strings of
# the packages to install
#######################################################
# pip_packages=("requests" "pandas")
# install_python_package "${pip_packages[@]}"


#######################################################
# Uncomment if there is data/VM resources/images to download.
# `file1`, `file2`, etc. should be space separated strings of
# (URL SHASUM-256 FILENAME).
#
# We recommend that explicit versions are used for all Images/VMRs to prevent
# possible differences between instances of a given Model Component.
# Please be mindful of the software versions as it can have unintended
# consequences on your Emulytics experiment.
#
# We require checksums of the files to assist users in verifying
# that they have downloaded the same version.
#######################################################
# Be sure to use SHA-256 hashes for the checksums (e.g. shasum -a 256 <file>)
# file1=("url1" "e0287e6339a4e77232a32725bacc7846216a1638faba62618a524a6613823df5" "file1")
# file2=("url2" "53669e1ee7d8666f24f82cb4eb561352a228b1136a956386cd315c9291e59d59" "file2")
# files=(file1 file2)
# wget_and_checksum "${files[@]}"
# echo "Downloaded and checksummed all files!"


#######################################################
# Add any other desired configuration/packaging here
#######################################################
echo "The {{mc_name}} INSTALL file currently doesn't do anything!"

# Set the flag to notify of successful completion
touch $INSTALL_FLAG