Epic: Experiment Design Document

Context and scope

The Earthquake Forecast Experiment for Italy (hereafter Experiment) is designed to test the performance of state-of-the-art short-term forecasting models in a prospective fashion. The Experiment's general objectives are:

Improve our understanding of the physics and statistics of earthquake occurrence.
Validate the models used in Operational Earthquake Forecasting and their components.

To perform this Experiment, a new software system will be designed here, which will run models, evaluate forecasts and display results of the experiment time-dependently, but on-demand. Its architecture will satisfy the current open-source scientific standards and CSEP philosophy. An Experiment can be allocated in an offical repository (Gitlab & Zenodo), cloned to a local machine, run, and the partial results commited/pushed on demand back.

However, the system on its own will not be sufficient to achieve the Experiment's objectives, although it must be closely related through the system requirements. It is worth mention, that it should be designed in stages specified below.

Goals and non-goals

Goals

The System will be able to manifest the definitions from the Experiment's Rules (e.g. regions, authoritative datasets, testing time windows and intervals, forecast formats, etc.)
Perform the fundamental tasks of a testing experiment, which could be executed in any Unix-based(?) machine, where the only limitation should be the computational time and storage limit. These tasks are:
- Access, retrieve, deploy, execute and store (on-demand) the competing forecast models and their results.
- Access, retrieve and record the Authoritative datasets (e.g. Catalog from Bollettino Sismico Italiano)
- Define a Test battery, implemented within pycsep, and record their specifications (e.g. seeds, iterations, confidence intervals, etc.)
- Evaluate the models' forecasts and store and register the results. Create a set of summarizing figures.
The System interface should allow to run a Experiment from scratch or from a previously executed state in the machine (e.g. Running every three-months, without needing to run the experiment from the beginning).
A framework (e.g. wrapper or decorators) will be created to handle and record the complex bookkeeping due to time-dependency. It will contain the required information to initialize an experiment from a middle-step, record the information for reproducibility, and should be easily serializable/imported. Perhaps not necessarily too readable, but at least exported to hr format. Could be split into two classes: (i) File-management (ii) Reproducibility Registry.
The Experiment Results should be at minimum displayed at a README.md file, which can be updated in every run and commit/pushed to an official git of the experiment (by an authorized user). The same as exported HR tables for external users. This could be extended to a static webserver/service (e.g. Using Dash to visualize tests, forecasts, catalogs, etc.).

Non-Goals

Design an API to access forecasts or results.
Schedule runs or continuous run-time.
Automatic update of datasets or models
Implement test routines external to pycsep
Intra-model parallelization
Continuous Integration (yet)

Design

An important aspect is clarity and reusability of the code, therefore, it should keep a modular structure. We aim to create experiments that can be deployed without requiring a developer-level knowledge of python. Complexity should be hidden in lower layers (e.g. registries, pycsep functions, exporters, readers, etc.).
Readability and runnability is preferred over performance. Parallelization, compressing, serializing, and the use of databases should only be implemented atop of the core system, and optionally, as an extension of the existing standards (e.g. serial-run of test methods should be the default option, as well as csep-format catalogs, etc).

System diagram

Preliminary Diagram

Interfaces

Registry: A class which will wrap all the elements of an Experiment. Should operate as a silent decorator or wrapper, through which all the Experiment Functions will pass. It allows run-time store of metadata, artifacts tracking and identifying data lineage (e.g. to which catalog and last_date run are the data related to). It should be constantly serialized into the local machine once run-time is over. An Experiment can be initialized from a middle step by reading a dumped Registry, which will parse and sum_check all the existing forecast, catalog and result files existing in the local machine. It will locate the files pertinent to the last runs, and populate a dictionary that points a forecast_date to a forecast and result file.
Catalog Accessor: A function that will access the Authoritative Catalog Source's API, store and convert to pyCSEP catalog format, recording the access date and version.
Model: A class that wraps a Model, which can access its repository, download the source code, build its Docker environment and create the forecasts for the input dates. An Experiment will contain a list of Models: The Experiment will be able to (i) download, build all the Models' environments and perform the build tests, (ii) Place the required files (e.g. Catalogs) into the Model path, (iii) Run all the models, which will dump the resulting Forecasts into files in the local machine and (iv) retrieve the forecasts by parsing the files into pycsep.Forecast objects.
Experiment Post-process: Once the Experiment has been successfully run, post-processing functions will allow to create human-readable tables, figures and reports, and place them into a top-level folder of the repository. In such a way, a simple commit-push would allow to visualize a report containing the updated results of the Experiment. From the functions, a static website could be implemented using Dash, from which results can be easily visualized.

Data storage

A particular Experiment (i.e. Italy for this project) will be stored in the form an online repository, and cloned into a local machine.
In the local repository, all models source code will be download and stored into a Models root folder.
Forecasts will be stored in the Model root folder once created, having a HR name. Possibility of allocating forecasts into a database posterior to their execution (Should we log each run?)
Evaluation results should be stored in an individual local folder corresponding the date of run.
A summary of the results (e.g. figures, tables, summary statistics) should be stored in each Evaluation-Run folder. It can then be accessed by a top-folder README.md or website

Requirements

An upper-level definition of the requirements of the system. We define the system requirements based on current-standards of scientific code development.

Reproducible

Must allow any independent user to obtain the exact Experiment results

Re-runnable

The system can execute the Experiment during the testing period and after its termination
It can be run on-demand by any user during its duration

Accessible

The system allows the Experiment results to be easily obtained and visualized
The Experiment can be deployed by any user with minimum (tbd) programming knowledge

Reusable

The software system can be applied to other case studies with minimal modifications
New features (e.g. methods) can be easily added or existing can be modified

Replicable

The system must allow replicability of the Experiment, i.e. using different datasets
Unambiguously defined, such it could be replicated by a different developer

Specifications

How will the requirements be satisfied.

Reproducibility

Results of the experiment can be exactly obtained by any independent user

Bookkeeping of authoritative data using a 'Registry' (see Interfaces)
- Versioning management > e.g. The DOI of the Buollettino Sismico Italiano (BSI) from each catalogue release
- Storage > Every run of the experiment should store the used authoritative data
Model source code and parameter susing a 'Registry'
- Models should be standalone code containers
- Models' codes should be identifiable and citable
- Models' seeds must be stored
- The Experiment system must be decoupled of the Model code, but record/store the details of the Model's execution.
Experiment Source code
- It should obtained an official source (e.g. Zenodo) as is, where versioning/labeling should correspond to each results' publication

Re-runnable

Experiment can be run on-demand by any user during its duration

Experiment can be deployed and run in any machine (linux-based only?), as long as it satisfies the computational requirements. We will take advantage of the following
- All models will be standalone dockerized code, determined from a Dockerfile (shared responsibility between modeler/tester)
- All Evaluation codes and formats will be handled by pyCSEP
- How much of the experiment design can be implemented in pycsep? THE MORE THE BETTER (for reusability)

It will be possible to run the experiment in the future, even after its termination

Don't step so deep in dependency hell: Clear definition of system requirements for the models and the experiment:
- The dockerization of the models should aim to reduce the possibility of dependency conflicts in the future
- Use tag (+push_id) versioning to create the Experiment from supported base docker-images (e.g. FROM python:3.8, FROM r-base:4.1.3)
- Identify system libraries versioning (e.g. apt install libgdal=1.20)
- Install python/R packages with fully specified versions through pip or R>install.packages()
- git installations should be done by specifying commit/tags

Accessible

Experiment results are easy to obtain, execute or visualize

Experiment results distribution should be uncoupled of the Experiment Code architecture. (e.g. results could be generated anywhere, but officially published on demand by an authorized user)
The experiment code should be able to generate a simple report of the results (or multiple features, with different levels of detail)
- As the model is run on demand, a branch/tag could be created in a git-repo to display the results of each run (similar to GEFE), where results are shown in README.md. Maybe also git pages?
- Experiment should also provide results in human-readable format (e.g. csv sheet with evaluation values for interested users)
- MAYBE: Experiment could create a static webpage using DASH, which can be hosted by any webserver (e.g. GFZ,etc) in the future (or cloud?)

The Experiment can be deployed by any user without a developer-level programming knowledge

The Experiment architecture can be completely accessed downloaded from an official source (e.g. zenodo).
The Experiment itself will be a docker-compose container, able to acquire authoritative data, run the models, create the forecast and evaluate, with a couple of code lines.
The code should be design in favor of clarity and simplicity, rather than efficiency.
Documentation is key

Reusable

Code can be applied to other case studies with minimal modifications

Decoupling the Experiment System as possible from the Experiment itself.The Italy experiment should be implemented in the most generic way, with most of the code to be implemented in pyCSEP
- For example, a 'Model' Class that interfaces a docker-container (code, virtual environment, etc) that generates forecast-files, agnostic of the actual model within. An 'Experiment' Class that handles the Models, Authoritative data, Evaluations etc. A 'Registry' Class that serves as wrapper to handle all the file tracking, lineage and registry abstractions.

New methods can be easily added or modified

This goes in line with pyCSEP, regarding new evaluation methods that can be implemented there.

Replicable

Experiment can be replicable with different dataset

For instance, a different catalog source (e.g. new version of BSI, or from EMEC)

Unambiguously defined, such it could be replicated by a different developer

Documenting both the code and clearly stating the steps in manuscript.
Perhaps much of the experiment creation, can be part of pycsep wiki/ documentation.

Workflows

Fresh run after 3 months of the kickstart
- Initialize 'Experiment' object with its parameters: name, git_url, zenodo_url, author, model_nsim, time_interval, etc.
  - At initializing, a 'Registry' associated with the 'Experiment' will be created and serialized in runtime or after significant operations. Simultaneously, a path tree will be created.
  - Set 'Region' to Experiment
  - Set 'CatalogAccessor' to Experiment. Should define filtering args (mw, depth, region)
  - Set 'Evaluations': Consist simply in a tuple with csep.Evaluation, params (e.g. num_sims, seeds, alpha) and plot_params(e.g. plot_func)
  - Set 'Models': Define Author and accessing mode (e.g. git or Zenodo) from url. 'Model.get()' will retrieve the models from the source and place in a Models folder. 'Model.build()' would then build the Docker Images from the model's Dockerfile, install the model requirements, run the model tests and initialize parameters (e.g. from learning catalog calibration).
- Create forecasts
  - A simple input of datetime intervals would be given. Consequently the 'Experiment' would initialize each model Docker, execute 'Model.run()' through a 'Registry'. Forecast files would be placed inside the Model local repository, whereas file tracking and lineage will be stored in the Registry object (and serialized after a successful run). Models could be run in parallel, by simply assigning CPU usage to each Model container. * Must have care with parallelization and seed management. If database usage is selected, posterior to a model run, forecasts could be placed into a db. Cleanup
- Run evaluations:
  - Accessing forecasts through 'Model.get_forecasts(datetimes(s))'. This would parse through CSEP the stored forecasts.
  - Call CatalogAccessor and get a Catalog filtered by each datetime(s).
  - Execute Tests and store results as serialization. Could also be stored as db
- Execute post-process rotuines (report, figures, tables).
- git branch -n results_rundate, git add .postprocessed, git checkout commit push.

Tasks

Implementation will be in two stages:
- The first stage (end-June) will allow to access a Catalog and the Models to be downloaded, built, tested and run. We aim to provide preliminary or dummy results, without concerning about a formal Registry. For the latter, dirty functions will be created to keep track of the existing files.
- The second stage (end-December) will ensure the complete tracking and registry of the Experiment, as well as aiming for a simple operability. (E.g the experiment to be run with a few commands in console). Design of Exception catching and correct Warning display.

Edited Jun 14, 2022 by Pablo Iturrieta Rebolledo