Sampling synthetic data#
The module data_gen.py
contains the functions needed to generate
synthetic data. The core function is generate_data
. This function
takes as input three variables
theta
: an event description (e.g. lat, long, depth, mag).sensors
: the network configuration (e.g. lat, long, noise std, num variables, and sensor type for each sensor).ndata
: the number of synthetic data realizations to generate for each data generating event.
This function returns the synthetic data for each sensor for each of the data realizations with this set of event characteristics.
Inside the generate_data
function, the data generating functions are
very flexible and can be modified to be anything. However, it is
important that these data generating functions correspond to the
likelihood functions e.g. that the data \(\mathcal{D}\) is in fact
distributed according to the likelihood
\(\mathcal{D} \sim p(\mathcal{D}|\theta)\). Therefore, when constructing
these functions it is often helpful to call functions from the
like_models.py
module imported as lm
.
As currently written, the generate_data
function looks like
def generate_data(theta,sensors,ndata):
probs = lm.detection_probability(theta,sensors)
fullprobs = np.outer(np.ones(ndata),probs)
u_mat = np.random.uniform(size = fullprobs.shape)
atimes = gen_arrival_normal(theta, sensors, ndata)
data = np.concatenate((atimes,u_mat<fullprobs),axis=1)
return data
but it can be modified to fit the models being used in the scenario.
In this function, the data generated for
the sensors has 2 parts. First is just an indicator function that
registers 1 if the sensor detects an event and 0 otherwise. The second
is the time at which the station registers the event. The functions
lm.detection_probability
and gen_arrival_normal
are used to generate
this data.