Customizing the event prior#
This code uses importance sampling to sample the event space, meaning that it draws samples from a non-prior distribution called the importance distribution and then weights those samples according to the likelihood ratio between the importance and the prior in order to approximate the prior distribution.
Required functions and definitions#
The sample generation file that is given in line 5 of the input file for
eig_calc.py
or in line 11 of the input file for network_opt.py
should contain four functions with specific names and function
definitions. The functions must be defined exactly as below, with
matching inputs and return types, in order for the eig_calc.py
script
to work with them properly:
def generate_theta_data(location_bounds, depth_range, mag_range, nsamp, skip):
"""
Generates synthetic events by sampling from the importance
distribution.
Parameters
----------
location_bounds : ndarray
List of coordinates that define the latitude/longitude boundary
from which events may be sampled
depth_range : (1, 2) ndarray
Depth range in which events will be generated
mag_range : (1, 2) ndarray
Magnitude range in which events will be generated
nsamp : int
Number of events to generate
skip : int
Seed variable that indicates how to start the quasi-random number
generator so that events aren't generated more than once
Returns
-------
theta : (nsamp, 4) ndarray
Events sampled according to the importance distribution
"""
pass
def sample_theta_space(location_bounds, depth_range, mag_range, nsamp, skip):
"""
Discretizes the sample domain using samples generated according to
the importance distribution.
Parameters
----------
location_bounds : ndarray
List of coordinates that define the latitude/longitude boundary
from which events may be sampled
depth_range : (1, 2) ndarray
Depth range in which events will be generated
mag_range : (1, 2) ndarray
Magnitude range in which events will be generated
nsamp : int
Number of events to generate
skip : int
Seed variable that indicates how to start the quasi-random number
generator so that events aren't generated more than once
Returns
-------
theta : (nsamp, 4) ndarray
Events sampled according to the importance distribution
"""
pass
def eval_theta_prior(thetas, location_bounds, depth_range, mag_range):
"""
Evaluates the probability density function of the prior distribution
on a sample.
Parameters
----------
location_bounds : ndarray
List of coordinates that define the latitude/longitude boundary
from which events may be sampled
depth_range : (1, 2) ndarray
Depth range in which events will be generated
mag_range : (1, 2) ndarray
Magnitude range in which events will be generated
nsamp : int
Number of events to generate
skip : int
Seed variable that indicates how to start the quasi-random number
generator so that events aren't generated more than once
Returns
-------
theta : (nsamp, 4) ndarray
Events sampled according to the importance distribution
"""
pass
def eval_importance(theta, location_bounds, depth_range, mag_range):
"""
Evaluates the probability density function of the importance
distribution on a set of samples.
Parameters
----------
location_bounds : ndarray
List of coordinates that define the latitude/longitude boundary
from which events may be sampled
depth_range : (1, 2) ndarray
Depth range in which events will be generated
mag_range : (1, 2) ndarray
Magnitude range in which events will be generated
nsamp : int
Number of events to generate
skip : int
Seed variable that indicates how to start the quasi-random number
generator so that events aren't generated more than once
Returns
-------
theta : (nsamp, 4) ndarray
Events sampled according to the importance distribution
"""
pass
For convenience, a uniform prior file is already defined, and is
available as the uniform_prior.py
file in the GitHub repository.
Explanation of required functions#
The two functions generate_theta_data
and sample_theta_space
serve
very similar purposes, e.g. returning a set of events, so for many
applications they can be the same. The distributions according to which
these events are returned can be modified to serve a variety of
purposes. An event corresponds to the theta vector that contains the
full distribution we are considering about an event like an earthquake
or explosion. This vector is 4D corresponding to latitude, longitude,
depth, and event magnitude.
The generate_theta_data
function should return a set of events
generated from the importance distribution over data generating events.
These events will be used to generate the synthetic data and in the code
are called theta_data
. For computing EIG, the prior distribution over
data generating events should be the prior distribution over event
hypothesis. However, for some applications it may make sense to bias
this distribution, meaning that you care more about EIG about a certain
type of events. For example, you may only care about EIG for events less
than magnitude 2 or events within 1km of the surface. This information
could be used to bias the distribution.
The sample_theta_space
function returns a set of events distributed
according to importance distribution, which will then be used to
approximate the prior over event hypotheses e.g. our prior knowledge in
Bayesian inference. These events will be used to define the space of
candidate events whose likelihood we will infer from the synthetic data.
In the code this is the variable theta_space
. These finite number of
events from the prior will in effect be used to discretize the posterior
distribution so that solving the Bayesian inference problem is easier.
Since this function is typically very similar to the
generate_theta_data
function.