Epic: Experiment System Requirements
Table of Contents
Scope
The Earthquake Forecast Experiment for Italy (hereafter Experiment) is designed to test the performance of state-of-the-art short-term forecasting models in a prospective fashion. The general objectives are (i) improve our understanding of the physics and statistics of earthquake occurrence and (ii) validate the models and their components used in Operational Earthquake Forecasting. To perform the Experiment, a new software system should be designed, whose architecture satisfies the current open-source scientific standards and CSEP philosophy. However, the design of the software architecture is not sufficient on its own to achieve the objectives of the Experiment, although they must be closely related to them through the system requirements.
Requirements
An upper-level definition of the requirements of the system. We define the system requirements based on current-standards of scientific code development.
Reproducible
- Must allow any independent user to obtain the exact Experiment results
Re-runnable
- The system can execute the Experiment during the testing period and after its termination
- It can be run on-demand by any user during its duration
Accessible
- The system allows the Experiment results to be easily obtained and visualized
- The Experiment can be deployed by any user with minimum (tbd) programming knowledge
Reusable
- The software system can be applied to other case studies with minimal modifications
- New features (e.g. methods) can be easily added or existing can be modified
Replicable
- The system must allow replicability of the Experiment, i.e. using different datasets
- Unambiguously defined, such it could be replicated by a different developer
Specifications
How will the requirements be satisfied.
Reproducibility
Results of the experiment can be exactly obtained by any independent user
- Bookkeeping of authoritative data
- Versioning management > e.g. The DOI of the Buollettino Sismico Italiano (BSI) from each catalogue release
- Storage > Every run of the experiment should store the used data
- Model source code and parameters
- Models should be standalone code containers
- Models' codes should be identifiable and citable
- The Experiment system must be decoupled of the Model code, but record/store the details of the Model's execution.
- Experiment Source code
- Experiment source code can be obtained an official source (e.g. Zenodo) as is, where versioning/labeling should correspond to each results' publication
Re-runnable
Experiment can be run on-demand by any user during its duration
- Experiment can be deployed and run in any machine (linux-based only?), as long as it satisfies the computational requirements. We will take advantage of the following
- All models will be standalone dockerized code, determined from a Dockerfile (shared responsibility between modeler/tester)
- All Evaluation codes will be handled by pyCSEP
- The experiment architecture (filepath management, generate model forecasts, pycsep interface, evaluations, results generation and visualization) will be designed in docker-compose
- How much of the experiment design can be implemented in pycsep? THE MORE THE BETTER (for reusability)
It will be possible to run the experiment in the future, even after its termination
- Don't step so deep in dependency hell: Clear definition of system requirements for the models and the experiment:
- The dockerization of the models should aim to reduce the possibility of dependency conflicts in the future
- Use tag (+push_id) versioning to create the Experiment from supported base docker-images (e.g. FROM python:3.8, FROM r-base:4.1.3)
- Identify system libraries versioning (e.g. apt install libgdal=1.20)
- Install python/R packages with fully specified versions through pip or R>install.packages()
- git installations should be done by specifying commit/tags
Accessible
Experiment results are easy to obtain, execute or visualize
- Experiment results distribution should be uncoupled of the Experiment Code architecture. (e.g. results could be generated anywhere, but officially published on demand by an authorized user)
- The experiment code should be able to generate a simple report of the results (or multiple features, with different levels of detail)
- As the model is run on demand, a branch/tag could be created in a git-repo to display the results of each run (similar to GEFE), where results are shown in README.md. Maybe also git pages?
- Experiment should also provide results in human-readable format (e.g. csv sheet with evaluation values for interested users)
- MAYBE: Experiment can create HTML, flask pages, which can be hosted by any webserver (e.g. GFZ,etc) in the future (or cloud?)
The Experiment can be deployed by any user with minimum (tbd) programming knowledge
- The Experiment architecture can be completely accessed downloaded from an official source (e.g. zenodo).
- The Experiment itself will be a docker-compose container, able to acquire authoritative data, run the models, create the forecast and evaluate, with a couple of code lines.
- The code should be design in favor of clarity and simplicity, rather than efficiency.
- Documentation is key
Reusable
Code can be applied to other case studies with minimal modifications
- Decoupling the Experiment System, from the Experiment itself: For this reason, I am in favor of designing the Italy experiment in a generic way, with most of the code to be implemented in pyCSEP >> For example, a Model class that interfaces a docker-container (code, virtual environment, etc), generating the required forecast-files. An Experiment class that handles the Models, Authoritative data, Evaluations, book-keeping etc. An Experiment object can interface a Model through specifying a Model's Format (i.e. Model type, the filepaths structure/storage), same as a Catalog Format. It probably implies, that the system must be designed independent of the datasets itself.
New methods can be easily added or modified
- This goes in line with pyCSEP, regarding new evaluation methods that can be implemented there. If we choose to include Model class, it should be able to handle different types of Model abstractions (virtual environments, local machine models, dockers, etc.)
Replicable
Experiment can be replicable with different dataset
- For instance, a different catalog
Unambiguously defined, such it could be replicated by a different developer
- Documenting both the code and clearly stating the steps in manuscript.
- Perhaps much of the experiment creation, can be part of pycsep wiki/ documentation.
Tasks
There are at least three main areas that needs to be developed for this experiment
Experiment Design
-
Devise documenting strategy -
Design of architecture (e.g. identifying components and relations) > to be specified in this issue -
Design of features > to be specified in this issue -
Glossary? -
Code scaffolding -
Feature implementations. So far: -
Model interfacing and management -
Bookkeeping and Serialization: Clear inventory of the information to be stored and how it will be stored. -
Evaluation: Most heavy-lifting already done in GEFE. Redesign workflow for evaluations (e.g. tracking Model status (are forecast already created?) with emphasis on Time-dependence, Are evaluations already calculated (?), etc.) -
Results visualization: Re-design GEFE MarkdownReports
-
Models preparation
-
Dockerize all models, libraries specification, test code execution. -
Devise template/instructive for modelers to handler their repos, and then Dockerize their models
-
-
Design interface with Experiment
pyCSEP improving
-
Determine which features will be included in pycsep nightly builds -
Implement ISIDE catalog web API accessing
Use cases
Example of expected uses of the system
- A. Visualize results of the experiment during its testing time
Overview: A researcher working in OEF wants to find out the performance of the competing model after one year the experiment started. Pre-reqs: (i) Experiment must have been run on-demand at least once. (ii) Results' figures must have been published/uploaded into the web. Success The researcher access the Zenodo Repo of the Experiment, whose last-version points to a Tagged commit of Github. From the README.md or files within, the researcher obtains a high-quality, citable figure and corresponding DOI, which can be either used internally in their organization, or shown where it is required. The figure could also be obtained through INGV or CSEP website, but that is out of the scope of the project.
- B. Obtain results of the experiment during its testing time
Overview: A researcher working in short-term hazard wants to retrieve the results' values of model performance (e.g. log-likelihoods) after one year the experiment started, to weight branches of logic-trees, perform bayesian updating. Pre-reqs: Same as before + Results published in Human Readable Format Success: Access the git repo tag for the last run. In there, obtains a csv file containing the experiment results up-to-date: E.g. time-series of evaluation metrics for competing models, will all the information for citation.
- C. Re use the code to perform a different experiment
Overview: A researcher working in Chile wants to design an experiment to evaluate their OEF models. In the seismological service, modeling time-dependent seismicity is relatively new, and they have no experience in performing experiments. Pre-reqs: (i) Source code completely available. (ii) Documentation for experiment implementation is available, both the Scientific and Programming levels. (iii) An example mockup model is available Success: (i) Feasibility check: The researcher downloads the source code from Zenodo. Replaces Catalog and Regions data, and experiment specifications (e.g. time window) and tests a Mockup model in the region. (ii) Modifies an own model (e.g. simple ETAS) to match our Model Template and runs the experiment pseudo-prospectively. (iii) Builds a prospective-experiment for Chile, following most of this experiment guidelines, and uploads the source-code to Zenodo
- D. Reproduce results
Overview: Someone wants to reproduce the exact results of the experiment to the date. It could be either to assess the quality of the experiment, re-assess evaluation metrics, or test their new-models pseudo-prospectively against gold-standard models and the metrics from this experiment Pre-reqs: (i) Source code available. (ii) Results available online, along with the authoritative dataset required to reproduce them (iii) Experiment completely defined at each version of a run (e.g. RUN_2022_3, RUN_2023_3, etc), such that the results are reproduced by running the source code. Success: The researcher is able to run the Experiment on its totality. If there is a difference in results, it can be easily isolated (ambiguous model, catalog re-evaluation, etc.).