Customizing the event prior#

This code uses importance sampling to sample the event space, meaning that it draws samples from a non-prior distribution called the importance distribution and then weights those samples according to the likelihood ratio between the importance and the prior in order to approximate the prior distribution.

Required functions and definitions#

The sample generation file that is given in line 5 of the input file for eig_calc.py or in line 11 of the input file for network_opt.py should contain four functions with specific names and function definitions. The functions must be defined exactly as below, with matching inputs and return types, in order for the eig_calc.py script to work with them properly:

def generate_theta_data(location_bounds, depth_range, mag_range, nsamp, skip):
    """
    Generates synthetic events by sampling from the importance
    distribution.
    
    Parameters
    ----------
    location_bounds : ndarray
        List of coordinates that define the latitude/longitude boundary 
        from which events may be sampled
    depth_range : (1, 2) ndarray
        Depth range in which events will be generated
    mag_range : (1, 2) ndarray
        Magnitude range in which events will be generated
    nsamp : int
        Number of events to generate
    skip : int
        Seed variable that indicates how to start the quasi-random number
        generator so that events aren't generated more than once
        
    Returns
    -------
    theta : (nsamp, 4) ndarray
        Events sampled according to the importance distribution
    """
    pass
def sample_theta_space(location_bounds, depth_range, mag_range, nsamp, skip):
    """
    Discretizes the sample domain using samples generated according to
    the importance distribution.
    
    Parameters
    ----------
    location_bounds : ndarray
        List of coordinates that define the latitude/longitude boundary 
        from which events may be sampled
    depth_range : (1, 2) ndarray
        Depth range in which events will be generated
    mag_range : (1, 2) ndarray
        Magnitude range in which events will be generated
    nsamp : int
        Number of events to generate
    skip : int
        Seed variable that indicates how to start the quasi-random number
        generator so that events aren't generated more than once
        
    Returns
    -------
    theta : (nsamp, 4) ndarray
        Events sampled according to the importance distribution
    """
    pass 
def eval_theta_prior(thetas, location_bounds, depth_range, mag_range):
    """
    Evaluates the probability density function of the prior distribution
    on a sample.
    
    Parameters
    ----------
    location_bounds : ndarray
        List of coordinates that define the latitude/longitude boundary 
        from which events may be sampled
    depth_range : (1, 2) ndarray
        Depth range in which events will be generated
    mag_range : (1, 2) ndarray
        Magnitude range in which events will be generated
    nsamp : int
        Number of events to generate
    skip : int
        Seed variable that indicates how to start the quasi-random number
        generator so that events aren't generated more than once
        
    Returns
    -------
    theta : (nsamp, 4) ndarray
        Events sampled according to the importance distribution
    """
    pass
def eval_importance(theta, location_bounds, depth_range, mag_range):
    """
    Evaluates the probability density function of the importance
    distribution on a set of samples.
    
    Parameters
    ----------
    location_bounds : ndarray
        List of coordinates that define the latitude/longitude boundary 
        from which events may be sampled
    depth_range : (1, 2) ndarray
        Depth range in which events will be generated
    mag_range : (1, 2) ndarray
        Magnitude range in which events will be generated
    nsamp : int
        Number of events to generate
    skip : int
        Seed variable that indicates how to start the quasi-random number
        generator so that events aren't generated more than once
        
    Returns
    -------
    theta : (nsamp, 4) ndarray
        Events sampled according to the importance distribution
    """
    pass

For convenience, a uniform prior file is already defined, and is available as the uniform_prior.py file in the GitHub repository.

Explanation of required functions#

The two functions generate_theta_data and sample_theta_space serve very similar purposes, e.g. returning a set of events, so for many applications they can be the same. The distributions according to which these events are returned can be modified to serve a variety of purposes. An event corresponds to the theta vector that contains the full distribution we are considering about an event like an earthquake or explosion. This vector is 4D corresponding to latitude, longitude, depth, and event magnitude.

The generate_theta_data function should return a set of events generated from the importance distribution over data generating events. These events will be used to generate the synthetic data and in the code are called theta_data. For computing EIG, the prior distribution over data generating events should be the prior distribution over event hypothesis. However, for some applications it may make sense to bias this distribution, meaning that you care more about EIG about a certain type of events. For example, you may only care about EIG for events less than magnitude 2 or events within 1km of the surface. This information could be used to bias the distribution.

The sample_theta_space function returns a set of events distributed according to importance distribution, which will then be used to approximate the prior over event hypotheses e.g. our prior knowledge in Bayesian inference. These events will be used to define the space of candidate events whose likelihood we will infer from the synthetic data. In the code this is the variable theta_space. These finite number of events from the prior will in effect be used to discretize the posterior distribution so that solving the Bayesian inference problem is easier. Since this function is typically very similar to the generate_theta_data function.