Sampling synthetic data

Sampling synthetic data#

The module data_gen.py contains the functions needed to generate synthetic data. The core function is generate_data. This function takes as input three variables

theta: an event description (e.g. lat, long, depth, mag).
sensors: the network configuration (e.g. lat, long, noise std, num variables, and sensor type for each sensor).
ndata: the number of synthetic data realizations to generate for each data generating event.

This function returns the synthetic data for each sensor for each of the data realizations with this set of event characteristics.

Inside the generate_data function, the data generating functions are very flexible and can be modified to be anything. However, it is important that these data generating functions correspond to the likelihood functions e.g. that the data \(\mathcal{D}\) is in fact distributed according to the likelihood \(\mathcal{D} \sim p(\mathcal{D}|\theta)\). Therefore, when constructing these functions it is often helpful to call functions from the like_models.py module imported as lm.

As currently written, the generate_data function looks like

def generate_data(theta,sensors,ndata):
    probs = lm.detection_probability(theta,sensors)
    fullprobs = np.outer(np.ones(ndata),probs)
    u_mat = np.random.uniform(size = fullprobs.shape)
    
    atimes = gen_arrival_normal(theta, sensors, ndata)    
    data = np.concatenate((atimes,u_mat<fullprobs),axis=1)

    return data

but it can be modified to fit the models being used in the scenario. In this function, the data generated for the sensors has 2 parts. First is just an indicator function that registers 1 if the sensor detects an event and 0 otherwise. The second is the time at which the station registers the event. The functions lm.detection_probability and gen_arrival_normal are used to generate this data.