clev2er

CLEV2ER Land Ice and Inland Waters GPP Project

Documentation for the CLEV2ER Land Ice and Inland Waters GPP project, hosted on GitHub at github.com/mssl-softeng/clev2er_liiw.

The GPP runs within a framework designed for (but not restricted to) Level-1b to Level-2 processing of ESA radar altimetry mission data. The key features of the framework are dynamically loaded algorithm classes (from XML or YML lists of algorithms) and in-built support for multi-processing and a consistent automated development and testing workflow. There are many run-time options in the chain controller command line tool.

The diagram below shows a simplified representation of the framework and its components.

CL-SII Framework

Main Features

  • Command line chain controller tool : src/clev2er/tools/run_chain.py
  • input L1b file selection (single file, multiple files or dynamic algorithm selection)
  • dynamic algorithm loading from XML or YML list(s)
    • algorithms are classes of type Algorithm with configurable .init(), .process(), .finalize() functions.
    • Algorithm.init() is called before any L1b file processing.
    • Algorithm.process() is called on every L1b file,
    • Algorithm.finalize() is called after all files have been processed.
    • Each algorithm has access to: L1b Dataset, shared working dict, config dict.
    • Algorithm/chain configuration by XML or YAML configuration files.
    • A shared python dictionary is used to pass algorithm outputs between algorithms in the chain.
  • logging with standard warning, info, debug, error levels (+ multi-processing logging support)
  • optional multi-processing built in, configurable maximum number of processes used.
  • optional use of shared memory (for example for large DEMs and Masks) when using multi-processing. This is an optional experimental feature that must be used with great care as it can result in memory leaks (requiring a server reboot to free) if shared memory is not correctly closed.
  • algorithm timing (with MP support)
  • chain timing
  • support for breakpoint files (saved as NetCDF4 files)

Other processing chains developed within framework:

Change Log

This section details major changes to the framework (not individual chains):

Date Change
14-Mar-24 Documentation deployment workflow moved to gh_pages branch
13-Mar-24 Target Python version for project updated to 3.11
10-Mar-24 Allows any_name.xml or .XML, .yml or .YML files for config or algorithm list
09-Mar-24 removed baseline and version support. Will use git branching instead
15-Nov-23 algorithm_lists file directory structure changed to now add directory /chainname/
10-Nov-23 breakpoint support added. See section on breakpoints below.

Installation of the Framework

Note that the framework installation has been tested on Linux and MacOS systems. Use on other operating systems is possible but may require additional install steps, and is not directly supported.

Make sure you have git installed on your target system.

Clone the git public repository in to a suitable directory on your system. This will create a directory called /clev2er_liiw in your current directory.

with https: git clone https://github.com/mssl-softeng/clev2er_liiw.git

or with ssh: git clone git@github.com:mssl-softeng/clev2er_liiw.git

or with the GitHub CLI: gh repo clone mssl-softeng/clev2er_liiw

Shell Environment Setup

The following shell environment variables need to be set to support framework operations.

In a bash shell this might be done by adding export lines to your $HOME/.bashrc file.

  • Set the CLEV2ER_BASE_DIR environment variable to the root of the clev2er package.
  • Add $CLEV2ER_BASE_DIR/src to PYTHONPATH.
  • Add ${CLEV2ER_BASE_DIR}/src/clev2er/tools to the PATH.
  • Set the shell's ulimit -n to allow enough file descriptors to be available for multi-processing.

An example environment setup is shown below (the path in the first line should be adapted for your specific directory path):

export CLEV2ER_BASE_DIR=/Users/someuser/software/clev2er_liiw
export PYTHONPATH=$PYTHONPATH:$CLEV2ER_BASE_DIR/src
export PATH=${CLEV2ER_BASE_DIR}/src/clev2er/tools:${PATH}
# for multi-processing/shared mem support set ulimit
# to make sure you have enough file descriptors available
ulimit -n 8192

Environment Setup for Specific Chains

Additional environment setup maybe required for specific chains. This is not necessary unless you intend to use these chains.

CLEV2ER Sea Ice Chain

The following is an example of potential additional environment variables required by the CLEV2ER seaice chain. Actual values currently TBD.

# Specific Environment for CLEV2ER:landice chain
export CLEV2ER_DATA_DIR=/some/dir/somewhere
export CLEV2ER_LOG_DIR=/some/logdir/somewhere

Python Requirement

python v3.11 must be installed or available before proceeding. A recommended minimal method of installation of python 3.11 is using Miniconda.

To install Python 3.11 using Miniconda, select the appropriate link for your operating system from:

https://docs.anaconda.com/free/miniconda/miniconda-other-installer-links/

For example, for Linux (select different installer for other operating systems), download the installer and install a minimal python 3.11 installation using:

wget https://repo.anaconda.com/miniconda/Miniconda3-py311_24.1.2-0-Linux-x86_64.sh
chmod +x Miniconda3-py311_24.1.2-0-Linux-x86_64.sh
./Miniconda3-py311_24.1.2-0-Linux-x86_64.sh

Do you wish the installer to initialize Miniconda3
by running conda init? [yes|no] yes

You may need to start a new shell to refresh your environment before checking that python 3.11 is in your path.

Check that python v3.11 is now available, by typing:

python -V

Virtual Environment and Package Requirements

This project uses poetry (a dependency manager, see: https://python-poetry.org/) to manage package dependencies and virtual envs.

First, you need to install poetry on your system using instructions from https://python-poetry.org/docs/#installation. Normally this just requires running:

curl -sSL https://install.python-poetry.org | python3 -

You should also then ensure that poetry is in your path, such that the command

poetry --version

returns the poetry version number. You may need to modify your PATH variable in order to achieve this.

To make sure poetry is setup to use Python 3.11 virtual env when in the CLEV2ER base directory

cd $CLEV2ER_BASE_DIR
poetry env use $(which python3.11)

Install Required Python packages using Poetry

Run the following command to install python dependencies for this project (for info, it uses settings in pyproject.toml to know what to install)

cd $CLEV2ER_BASE_DIR
poetry install

Load the Virtual Environment

Now you are all setup to go. Whenever you want to run any CLEV2ER chains you must first load the virtual environment using the poetry shell or poetry run commands.

cd $CLEV2ER_BASE_DIR
poetry shell

You should now be setup to run processing chains, etc.

Run a simple chain test example

The following command will run a simple example test chain which dynamically loads 2 template algorithms and runs them on a set of CryoSat L1b files in a test data directory. The algorithms do not perform any actual processing as they are just template examples. Make sure you have the virtual environment already loaded using poetry shell before running this command.

run_chain.py -n testchain -d $CLEV2ER_BASE_DIR/testdata/cs2/l1bfiles

There should be no errors. Note that run_chain.py is setup as an executable, so it is not necessary to use python run_chain.py, although this will also work.

Note that the algorithms that are dynamically run are located in $CLEV2ER_BASE_DIR/src/clev2er/algorithms/testchain/alg_template1.py, alg_template2.py

The list of algorithms (and their order) for testchain are defined in $CLEV2ER_BASE_DIR/config/algorithm_lists/testchain/testchain_alglist.xml

Chain configuration settings are defined in

$CLEV2ER_BASE_DIR/config/main_config.xml and

Algorithm configuration settings are defined in

$CLEV2ER_BASE_DIR/config/chain_configs/testchain/testchain_config.xml

To find all the command line options for run_chain.py, type:

run_chain.py -h

For further info, please see clev2er.tools

Developer Requirements

This section details additional installation requirements for developers who will develop/adapt new chains or algorithms.

Install pre-commit hooks

pre-commit hooks are static code analysis scripts which are run (and must be passed) before each git commit. For this project they include pylint, ruff, mypy, black, isort.

To install pre-commit hooks, do the following: (note that the second line is not necessary if you have already loaded the virtual environment using poetry shell)

cd $CLEV2ER_BASE_DIR
poetry shell
pre-commit install
pre-commit run --all-files

Now, whenever you make changes to your code, it is recommended to run the following in your current code directory.

pre-commit run --all-files

This will check that your code passes all static code tests prior to running git commit. Note that these same tests are also run when you do a new commit, ie using git commit -a -m "commit message". If the tests fail you must correct the errors before proceeding, and then rerun the git commit.

Developer Workflow

This section describes the method that should be used to contribute to the project code. The basic method to develop a new feature is:

On your local repository:

  1. Make sure your local 'master' branch is checked out and up-to-date (some steps may not be necessary).

    cd $CLEV2ER_BASE_DIR
    git checkout master
    git pull
    
  2. Create a new branch, named xxx_featurename, where xxx is your initials

    git checkout -b xxx_featurename

  3. Develop and test your new feature within this branch, making git additions and commits as necessary. You should have at least one commit (probably several).

    git commit -a -m "description of change"

  4. If you are developing a new module, then you must also write a pytest test for that module in a tests directory located in the same directory as the module. Note the section on pytest markers at the end of this document.

  5. Static analysis tests will be run on your changes using pre-commit, either automatically during a git commit or by running in the directory of the code change or in the repository base directory (for a more complete check):

    pre-commit run --all

  6. Once tested, push the new feature branch to GitHub

    git push -u origin xxx_featurename [first time], or just git push

  7. Go to GitHub: [github.com/mssl-softeng/clev2er_liiw] (https://github.com/mssl-softeng/clev2er_liiw) or direct to the pull request URL shown in your git pull command.

  8. Create a Pull Request on GitHub for your feature branch. This will automatically start a CI workflow that tests your branch for code issues and runs pytest tests. If it fails you should correct the errors on your local branch and repeat (steps 3 onwards) until it passes all tests.

  9. Finally your pull request will be reviewed and if accepted merged into the 'master' branch.

  10. You can then delete your local branch and the remote branch on Github.

    git branch -d xxx_featurename
    git push origin --delete xxx_featurename
    
    
  11. Repeat the whole process to add your next feature.

Framework and Chain Configuration

The framework (run controller) and individual named algorithm chains each have separate configuration files. Configuration options can be categorized as:

  • run controller (or main framework ) default configuration
  • per chain default configuration (to configure individual algorithms and resources)
  • command line options (for input selection and modifying any default configuration options)

Chains can be configured using XML or YAML configuration files and optional command line options in the following order of increasing precedence:

  • main config file: $CLEV2ER_BASE_DIR/config/main_config.xml [Must be XML]
  • chain specific config file: $CLEV2ER_BASE_DIR/config/chain_configs/chain_name/config_file_name.xml, XML or .yml
  • command line options
  • command line additional config options using the --conf_opts

The configurations are passed to the chain's algorithms and finder classes, via a merged python dictionary, available to the Algorithm classes as self.config.

Run Control Configuration

The default run control configuration file is $CLEV2ER_BASE_DIR/config/main_config.xml

This contains general default settings for the chain controller. Each of these can be overridden by the relevant command line options.

Setting Options Description
use_multi_processing true or false if true multi-processing is used
max_processes_for_multiprocessing int max number of processes to use for multi-processing
use_shared_memory true or false if true allow use of shared memory. Experimental feature
stop_on_error true or false stop chain on first error found, or log error and skip

Chain Specific Configuration

The default configuration for your chain's algorithms and finder classes should be placed in the chain specific config file:

$CLEV2ER_BASE_DIR/config/chain_configs/<chain_name>/<anyname>[.xml,.XML,or .yml]

Configuration files may be either XML(.xml) or YAML (.yml) format.

Formatting Rules for Chain Configuration Files

YAML or XML files can contain multi-level settings for key value pairs of boolean, int, float or str.

  • boolean values must be set to the string true or false (case insensitive)
  • environment variables are allowed within strings as $ENV_NAME or ${ENV_NAME} (and will be evaluated)
  • YAML or XML files may have multiple levels (or sections)
  • XML files must have a top root level named configuration wrapping the lower levels. This is removed from the python config dictionary before being passed to the algorithms.
  • chain configuration files must have a
    • log_files section to provide locations of the log files (see below)
    • breakpoint_files section to provide locations of the log files (see below)

Example of sections from a 2 level config file in YML:

# some_key: str:  description
some_key: a string

section1:
    key1: 1
    key2: 1.5
    some_data_location: $MYDATA/dem.nc

section2:
    key: false

Example of sections from a 2 level config file in XML:

<?xml version="1.0"?>

<!-- configuration xml level required, but removed in python dict -->
<configuration>

<!--some_key: str:  description-->
<some_key>a string</some_key>

<section1>
   <key1>1</key1>
   <key2>1.5</key2>
   <some_data_location>$MYDATA/dem.nc</some_data_location>
</section1>

<section2>
   <key>false</key>
</section2>

</configuration>

These settings are available within Algorithm classes as a python dictionary called self.config as in the following examples:

self.config['section1']['key1']
self.config['section1']['some_data_location']
self.config['some_key']

The config file will also be merged with the main run control dictionary. Settings in the chain configuration file will take precedence over the main run control dictionary (if they are identical), so you can override any main config settings in the named chain config if you want.

Required Chain Configuration Settings

Each chain configuration file should contain sections to configure logging and breakpoints. See the section on logging below for an explanation of the settings.

Here is a minimal configuration file (XML format)

<?xml version="1.0"?>
<!--chain: mychain configuration file-->

<configuration> <!-- note this level is removed in python dict -->

<!--Setup default locations to store breakpoint files-->
<breakpoint_files>
    <!-- set the default directory where breakpoint files are stored -->
    <default_dir>/tmp</default_dir>
</breakpoint_files>

<log_files>
    <!-- default directory to store log files -->
    <default_dir>/tmp</default_dir>
    <!-- info_name : str : file name base str for info files -->
    <info_name>info</info_name>
    <!-- error_name : str : file name base str for errorfiles -->
    <error_name>error</error_name>
    <!-- debug_name : str : file name base str for debug files -->
    <debug_name>debug</debug_name>
    <!-- logname_str : str : additional string to add to end of log filename, before .log
    Leave empty if mot required
    -->
    <logname_str></logname_str>

    <!-- append_date_selection : true or false, if year and month are specified on
    command line append _MMYYYY to log file base name (before .log) -->
    <append_date_selection>true</append_date_selection>
    <append_process_id>false</append_process_id>
    <append_start_time>true</append_start_time>
</log_files>

<!-- add more levels and settings below here -->

<resources>
        <physical_constants>

                <directory>$CLEV2ER_BASE_DIR/testdata/adf/common</directory>
                <filename>
CR__AX_GR_CST__AX_00000000T000000_99999999T999999_20240201T000000__________________CPOM_SIR__V01.NC
                </filename>
                <mandatory>True</mandatory>
        </physical_constants>
</resources>

</configuration>

The requirement for specific settings are set by the chain and it's algorithms. An example of a chain configuration file can be found at:

$CLEV2ER_BASE_DIR/config/chain_configs/testchain/testchain_config.xml

For testing purposes it is sometimes useful to modify configuration settings directly from the command line. This can be done using the command line option --conf_opts which can contain a comma separated list of section:key:value pairs.

An example of changing the value of the setting above would be:

--conf_opts resources:mydata:${MYDATA_DIR}/somedata2.nc

Developing New Chains

  1. Decide on a chain name. For example newchain
  2. Create $CLEV2ER_BASE_DIR/algorithms/newchain/ directory to store the new chain's algorithms.
  3. Create $CLEV2ER_BASE_DIR/algorithms/newchain/tests to store the new chain's algorithm unit tests (using tests formatted for pytest). At least one algorithm test file should be created per algorithm, which should contain suitable test functions.
  4. Create your algorithms by copying and renaming the algorithm class template $CLEV2ER_BASE_DIR/algorithms/testchain/alg_template1.py in to your algorithm directory. Each algorithm should have a different file name of your choice. For example: alg_retrack.py, alg_geolocate.py. You need to fill in the appropriate sections of the init(), process() and finalize() functions for each algorithm (see section below for more details on using algorithm classes).
  5. You must also create a test for each algorithm in $CLEV2ER_BASE_DIR/algorithms/newchain/tests. You should copy/adapt the test template $CLEV2ER_BASE_DIR/algorithms/testchain/tests/test_alg_template1.py for your new test.
  6. Each algorithm and their unit tests must pass the static code checks (pylint, mypy, etc) which are automatically run as git pre-commit hooks.
  7. Create a first XML or YML configuration file for the chain in $CLEV2ER_BASE_DIR/config/chain_configs/newchain/anyname.yml or .xml. The configuration file contains any settings or resource locations that are required by your algorithms, and may include environment variables.
  8. If required create one or more finder class files. These allow fine control of L1b file selection from the command line (see section below for more details).
  9. Create an algorithm list YML file in $CLEV2ER_BASE_DIR/config/algorithm_lists/newchain/anyname.xml (or .yml) You can copy the template in $CLEV2ER_BASE_DIR/config/algorithm_lists/testchain/testchain_config.xml
  10. To test your chain on a single L1b file, you can use run_chain.py --name newchain -f /path/to/a/l1b_file. There are many other options for running chains (see run_chain.py -h).

Algorithm and Finder Classes

This section discusses how to develop algorithms for your chain. There are two types of algorithms, both of which are dynamically loaded at chain run-time.

  • Main algorithms : standard chain algorithm classes
  • Finder algorithms : optional classes to manage input L1b file selection

Algorithm Lists

Algorithms are dynamically loaded in a chain when (and in the order ) they are named in the chain's algorithm list YAML or XML file: $CLEV2ER_BASE_DIR/config/algorithm_lists/chainname/chainname.yml,.xml. This has two sections (l1b_file_selectors, and algorithms) as shown in the example below:

YML version:

# List of L1b selector classes to call in order
l1b_file_selectors:
  - find_lrm  # find LRM mode files that match command line options
  - find_sin  # find SIN mode files that match command line options
# List of main algorithms to call in order
algorithms:
  - alg_identify_file # find and store basic l1b parameters
  - alg_skip_on_mode  # finds the instrument mode of L1b, skip SAR files
  #- alg_...

XML version:

The xml version requires an additional toplevel <algorithm_list> that wraps the other sections. It also allows you to enable or disable individual algorithms within the list by setting the values Enable or Disable, and to set breakpoints by setting the value to BreakpointAfter.

<?xml version="1.0"?>

<algorithm_list>
    <algorithms>
        <alg_identify_file>Enable</alg_identify_file>
        <alg_skip_on_mode>Enable</alg_skip_on_mode>
        <!-- ... more algorithms -->
        <alg_retrack>BreakpointAfter</alg_retrack>
    </algorithms>

    <l1b_file_selectors>
        <find_lrm>Enable</find_lrm>
        <find_sin>Enable</find_sin>
    </l1b_file_selectors>
</algorithm_list>

Main Algorithms

Each algorithm is implemented in a separate module located in

$CLEV2ER_BASE_DIR/src/clev2er/algorithms/<chainname>/<alg_name>.py

Each algorithm module should contain an Algorithm class, as per the algorithm template in:

$CLEV2ER_BASE_DIR/src/clev2er/algorithms/testchain/alg_template1.py

Please copy this template for all algorithms.

Algorithm class modules have three main functions:

  • init() : used for initializing/loading resources. Called once at the start of processing.
  • process(l1b:Dataset,shared_dict:dict) : called for every L1b file. The results of the processing may be saved in the shared_dict, so that it can be accessed by algorithms called further down the chain. The L1b data for the current file being processed is passed to this function in a netcdf4 Dataset as argument l1b.
  • finalize() : called at the end of all processing to free resouces.

All of the functions have access to the merged chain configuration dictionary self.config.

All logging must be done using self.log.info(), self.log.error(), self.log.debug().

Algorithm.process() return values

It is important to note that Algorithm.process() return values affect how the chain operates. The .process() function returns (bool, str).

Return values must be set as follows:

  • (True,"") when the processing has completed without errors and continuation to the next algorithm in the chain (if available) is expected.
  • (False,"SKIP_OK any reason message") when the processing has found a valid reason for the chain to skip any further processing of the L1b file. For example if it does not measure over the target area. This will be logged as DEBUG message but is not an error. The chain will move to processing the next L1b file.
  • (False,"some error message") : In this case the error message will be logged to the error log and the file will be skipped. If config["chain"]["stop_on_error"] is False then the chain will continue to the next L1b file. If config["chain"]["stop_on_error"] is True, then the chain will stop.

FileFinder Classes

FileFinder class modules provide more complex and tailored L1b input file selection than would be possible with the standard run_chain.py command line options of :

  • (--file path) : choose single L1b file
  • (--dir dir) : choose all L1b files in a flat directory

FileFinder classes are only used as the file selection method if the --file and --dir command line options are not used.

For example you may wish to select files using a specific search pattern, or from multiple directories.

FileFinder classes are automatically initialized with :

  • self.config dict from the merged chain dict, any settings can be used for file selection
  • self.months (from command line option --month, if used)
  • self.years (from command line option --year, if used)

FileFinder classes return a list of file paths through their .find_files() function. Code needs to be added to the .find_files() function to generate the file list.

Any number of differently named FileFinder class modules can be specified in the algorithm list file, under the l1b_file_selectors: section. File lists are concatentated if more than one Finder class is used.

An example of a FileFinder class module can be found in:

clev2er.algorithms.cryotempo.find_lrm.py

Logging

Logging within the chain is performed using the python standard logging.Logger mechanism but with some minor adaption to support multi-processing.

Within algorithm modules, logging should be performed using the in-class Logger instance accessed using self.log :

  • self.log.info('message') : to log informational messages
  • self.log.error('message') : to log error messages
  • self.log.debug('message') : to log messages for debugging

Debugging messages are only produced/saved if the chain is run in debug mode (use run_chain.py --debug command line option)

Log file Locations

Info, error, and debug logs are stored in separate log files. The locations of the log files are set in the chain configuration file in a section called log_files. You can use environment variables in your log file paths.

# Default locations for log files
log_files:
  append_year_month_to_logname: true
  errors: ${CT_LOG_DIR}/errors.log
  info:   ${CT_LOG_DIR}/info.log
  debug:  ${CT_LOG_DIR}/debug.log

The append_year_month_to_logname setting is used if the chain is run with the --year (and/or) --month command line args. Note that these command line options are passed to the optional finder classes to generate a list of L1b input files.

If these are used and the append_year_month_to_logname setting is true, then the year and month are appended to the log file names as follows:

  • logname_MMYYYY.log : if both month and year are specified
  • logname_YYYY.log : if only year is used

Logging when using Multi-Processing

When multi-processing mode is selected then logged messages are automatically passed through a pipe to a temporary file (logfilename.mp). This will contain an unordered list of messages from all processes, which is difficult to read directly.

At the end of the chain run the multi-processing log outputs are automatically sorted so that messages relating to each L1b file processing are collected together in order. This is then merged in to the main log file.

Breakpoint Files

Breakpoints can be set after any Algorithm by:
  • setting the BreakpointAfter value in the chain's Algorithm list, or
  • using the run_chain.py command line argument --breakpoint_after algorithm_name
When a breakpoint is set:
  • the chain will stop after the specified algorithm has completed for each input file.
  • the contents of the chain's shared_dict will be saved as a NetCDF4 file in the <breakpoint_dir> as specified in the breakpoints:default_dir section in the chain configuration file.
  • the NetCDF4 file will be named as <breakpoint_dir>/<l1b_file_name>_bkp.nc
  • if multiple L1b files are being processed through the chain, a breakpoint file will be created for each.
  • single values or strings in the shared_dict will be included as global or group NetCDF attributes.
  • if there are multiple levels in the shared_dict then a NetCDF group will be created for each level.
  • multi-dimensional arrays (or numpy arrays) are supported up to dimension 3.
  • NetCDF dimension variables will not be named with physical meaning (ie time), as this information can not be generically derived. Instead dimensions will be named dim1, dim2, etc.
  • all variables with the same dimension will share a common NetCDF dimension (ie dim1, etc)

Developer Notes

Code checks before committing

It is recommended to run pre-commit before a git commit. This runs the static code analysis tests (isort, pylint, ruff, mypy,.. ) on your code and shows you any failures before you commit. The same tests are also run when you commit (and must pass).

precommit run --all

Pytest Markers

Pytest markers are setup in $CLEV2ER_BASE_DIR/pytest.ini

It is important to use the correct pytest marker due to the use of GitHub CI workflows that run pytest on the whole repository source code. Some pytest tests are not suitable for GitHub CI workflow runs due to their large external data dependencies. These need to be marked with pytest.mark.requires_external_data so that they are skipped. These tests can be run locally where access to the data is available.

The following Pytest markers should be used in front of relevant pytest functions:

  • requires_external_data: testable on local systems with access to all external data/ADF (outside repo)
  • non_core: used to label non-core function tests such as area plotting functions

Example:

@pytest.mark.requires_external_data  # not testable on GitHub due to external data
def test_alg_lig_process_large_dem(l1b_file) -> None:

or placed at the top of a module:

pytestmark = pytest.mark.non_core

GitHub Pages Documentation from in-code Docstrings

This user manual is hosted on GitHub pages (https://mssl-softeng.github.io/clev2er_liiw)

Content is created from doctrings (optionally containing Markdown: https://www.markdownguide.org/basic-syntax/#code ) in the code, using the pdoc package : https://pdoc.dev

Diagrams can be implemented using mermaid: https://mermaid.js.org

The site is locally built in $CLEV2ER_BASE_DIR/docs, using a pre-commit hook (hook id: pdocs_build). Hooks are configured in $CLEV2ER_BASE_DIR/.pre-commit-config.yaml

The hook calls the script $CLEV2ER_BASE_DIR/pdocs_build.sh to build the site whenever a git commit is run in branch gh_pages.

When a git push is run, GitHub automatically extracts the site from the docs directory and publishes it.

The front page of the site (ie this page) is located in the doctring within $CLEV2ER_BASE_DIR/src/clev2er/__init__.py.

The docstring within __init__.py of each package directory should provide markdown to describe the directories beneath it.

Process to Update Docs

One method of updating the GitHub Pages documentation from the code (ie to process the docstrings in to html in the /docs folder)

  • Edit docstrings in master branch code (or by pull request from other branch)
  • git commit -a -m "docs update"
  • git checkout gh_pages
  • git merge master
  • pre-commit run --all (runs pdocs to update the html in docs folder)
  • git commit -a -m "docs update"
  • git push
  • git checkout master (return to master branch)
  • git merge gh_pages
  • git push

Why isn't this run automatically from the master branch or in a GitHub workflow? This is because pdocs (part of the pre-commit hooks) requires all code dependencies to be in place, including external data, when parsing/importing the code. External data is not available on GitHub, and also on some minimal installations of the master branch. So, to avoid pre-commit failing due to pdocs on other branches, or GitHub workflows doing the same, the docs are only updated on a controlled 'gh_pages' branch (which has all external data installed).

  1"""
  2# CLEV2ER Land Ice and Inland Waters GPP Project
  3
  4Documentation for the CLEV2ER Land Ice and Inland Waters GPP project, hosted on GitHub at
  5[github.com/mssl-softeng/clev2er_liiw](https://github.com/mssl-softeng/clev2er_liiw).
  6
  7The GPP runs within a framework designed for (but not
  8restricted to) Level-1b to Level-2 processing of ESA radar altimetry mission data. The key features
  9of the framework are dynamically loaded algorithm classes (from XML or YML lists of algorithms) and
 10in-built support for multi-processing and a consistent automated development and testing workflow.
 11There are many run-time options in the chain controller command line tool.
 12
 13The diagram below shows a simplified representation of the framework and its components.
 14
 15![CL-SII Framework](https://www.homepages.ucl.ac.uk/~ucasamu/cl_liiw_framework.png)
 16
 17## Main Features
 18
 19* Command line chain controller tool : src/clev2er/tools/run_chain.py
 20* input L1b file selection (single file, multiple files or dynamic algorithm selection)
 21* dynamic algorithm loading from XML or YML list(s)
 22  * algorithms are classes of type Algorithm with configurable .init(), .process(), .finalize()
 23    functions.
 24  * Algorithm.init() is called before any L1b file processing.
 25  * Algorithm.process() is called on every L1b file,
 26  * Algorithm.finalize() is called after all files have been processed.
 27  * Each algorithm has access to: L1b Dataset, shared working dict, config dict.
 28  * Algorithm/chain configuration by XML or YAML configuration files.
 29  * A shared python dictionary is used to pass algorithm outputs between algorithms in the chain.
 30* logging with standard warning, info, debug, error levels (+ multi-processing logging support)
 31* optional multi-processing built in, configurable maximum number of processes used.
 32* optional use of shared memory (for example for large DEMs and Masks) when using multi-processing.
 33This is an optional experimental feature that must be used with great care as it can result in
 34memory leaks (requiring a server reboot to free) if shared memory is not correctly closed.
 35* algorithm timing (with MP support)
 36* chain timing
 37* support for breakpoint files (saved as NetCDF4 files)
 38
 39##Other processing chains developed within framework:
 40
 41-   [CLEV2ER Sea Ice & Icebergs](https://github.com/mssl-softeng/clev2er_sii)
 42-   [CryoTEMPO Land Ice](https://github.com/mssl-softeng/clev2er_cryotempo)
 43-   [CPOM Sea Ice](https://github.com/CPOM-Altimetry/cpom_seaice)
 44-   [Generic Framework](https://github.com/mssl-softeng/clev2er)
 45
 46## Change Log
 47
 48This section details major changes to the framework (not individual chains):
 49
 50| Date | Change |
 51| ------- | ------- |
 52| 14-Mar-24 | Documentation deployment workflow moved to gh_pages branch|
 53| 13-Mar-24 | Target Python version for project updated to 3.11|
 54| 10-Mar-24 | Allows any_name.xml or .XML, .yml or .YML files for config or algorithm list|
 55| 09-Mar-24 | removed baseline and version support. Will use git branching instead|
 56| 15-Nov-23 | algorithm_lists file directory structure changed to now add directory /*chainname*/|
 57| 10-Nov-23 | breakpoint support added. See section on breakpoints below. |
 58
 59## Installation of the Framework
 60
 61Note that the framework installation has been tested on Linux and MacOS systems. Use on
 62other operating systems is possible but may require additional install steps, and is not
 63directly supported.
 64
 65Make sure you have *git* installed on your target system.
 66
 67Clone the git public repository in to a suitable directory on your system.
 68This will create a directory called **/clev2er_liiw** in your current directory.
 69
 70with https:
 71`git clone https://github.com/mssl-softeng/clev2er_liiw.git`
 72
 73or with ssh:
 74`git clone git@github.com:mssl-softeng/clev2er_liiw.git`
 75
 76or with the GitHub CLI:
 77`gh repo clone mssl-softeng/clev2er_liiw`
 78
 79## Shell Environment Setup
 80
 81The following shell environment variables need to be set to support framework
 82operations.
 83
 84In a bash shell this might be done by adding export lines to your $HOME/.bashrc file.
 85
 86- Set the *CLEV2ER_BASE_DIR* environment variable to the root of the clev2er package.
 87- Add $CLEV2ER_BASE_DIR/src to *PYTHONPATH*.
 88- Add ${CLEV2ER_BASE_DIR}/src/clev2er/tools to the *PATH*.
 89- Set the shell's *ulimit -n* to allow enough file descriptors to be available for
 90    multi-processing.
 91
 92An example environment setup is shown below (the path in the first line should be
 93adapted for your specific directory path):
 94
 95```script
 96export CLEV2ER_BASE_DIR=/Users/someuser/software/clev2er_liiw
 97export PYTHONPATH=$PYTHONPATH:$CLEV2ER_BASE_DIR/src
 98export PATH=${CLEV2ER_BASE_DIR}/src/clev2er/tools:${PATH}
 99# for multi-processing/shared mem support set ulimit
100# to make sure you have enough file descriptors available
101ulimit -n 8192
102```
103
104### Environment Setup for Specific Chains
105
106Additional environment setup maybe required for specific chains. This is not
107necessary unless you intend to use these chains.
108
109#### CLEV2ER Sea Ice Chain
110
111The following is an example of potential additional environment variables
112required by the CLEV2ER **seaice**
113chain. Actual values currently TBD.
114
115```script
116# Specific Environment for CLEV2ER:landice chain
117export CLEV2ER_DATA_DIR=/some/dir/somewhere
118export CLEV2ER_LOG_DIR=/some/logdir/somewhere
119```
120
121## Python Requirement
122
123python v3.11 must be installed or available before proceeding.
124A recommended minimal method of installation of python 3.11 is using Miniconda.
125
126To install Python 3.11 using Miniconda, select the appropriate link for your operating system from:
127
128https://docs.anaconda.com/free/miniconda/miniconda-other-installer-links/
129
130For example, for **Linux** (select different installer for other operating systems),
131download the installer and install a minimal python 3.11 installation using:
132
133```script
134wget https://repo.anaconda.com/miniconda/Miniconda3-py311_24.1.2-0-Linux-x86_64.sh
135chmod +x Miniconda3-py311_24.1.2-0-Linux-x86_64.sh
136./Miniconda3-py311_24.1.2-0-Linux-x86_64.sh
137
138Do you wish the installer to initialize Miniconda3
139by running conda init? [yes|no] yes
140```
141You may need to start a new shell to refresh your environment before
142checking that python 3.11 is in your path.
143
144Check that python v3.11 is now available, by typing:
145
146```
147python -V
148```
149
150## Virtual Environment and Package Requirements
151
152This project uses *poetry* (a dependency manager, see: https://python-poetry.org/) to manage
153package dependencies and virtual envs.
154
155First, you need to install *poetry* on your system using instructions from
156https://python-poetry.org/docs/#installation. Normally this just requires running:
157
158`curl -sSL https://install.python-poetry.org | python3 -`
159
160You should also then ensure that poetry is in your path, such that the command
161
162`poetry --version`
163
164returns the poetry version number. You may need to modify your
165PATH variable in order to achieve this.
166
167To make sure poetry is setup to use Python 3.11 virtual env when in the CLEV2ER base directory
168
169```
170cd $CLEV2ER_BASE_DIR
171poetry env use $(which python3.11)
172```
173
174### Install Required Python packages using Poetry
175
176Run the following command to install python dependencies for this project
177(for info, it uses settings in pyproject.toml to know what to install)
178
179```
180cd $CLEV2ER_BASE_DIR
181poetry install
182```
183
184### Load the Virtual Environment
185
186Now you are all setup to go. Whenever you want to run any CLEV2ER chains you
187must first load the virtual environment using the `poetry shell` or `poetry run` commands.
188
189```
190cd $CLEV2ER_BASE_DIR
191poetry shell
192```
193
194You should now be setup to run processing chains, etc.
195
196## Run a simple chain test example
197
198The following command will run a simple example test chain which dynamically loads
1992 template algorithms and runs them on a set of CryoSat L1b files in a test data directory.
200The algorithms do not perform any actual processing as they are just template examples.
201Make sure you have the virtual environment already loaded using `poetry shell` before
202running this command.
203
204`run_chain.py -n testchain -d $CLEV2ER_BASE_DIR/testdata/cs2/l1bfiles`
205
206There should be no errors. Note that run_chain.py is setup as an executable, so it is not
207necessary to use `python run_chain.py`, although this will also work.
208
209Note that the algorithms that are dynamically run are located in
210$CLEV2ER_BASE_DIR/src/clev2er/algorithms/testchain/alg_template1.py, alg_template2.py
211
212The list of algorithms (and their order) for *testchain* are defined in
213$CLEV2ER_BASE_DIR/config/algorithm_lists/testchain/testchain_alglist.xml
214
215Chain configuration settings are defined in
216
217$CLEV2ER_BASE_DIR/config/main_config.xml and
218
219Algorithm configuration settings are defined in
220
221$CLEV2ER_BASE_DIR/config/chain_configs/testchain/testchain_config.xml
222
223To find all the command line options for *run_chain.py*, type:
224
225`run_chain.py -h`
226
227For further info, please see `clev2er.tools`
228
229## Developer Requirements
230
231This section details additional installation requirements for developers who will develop/adapt
232new chains or algorithms.
233
234### Install pre-commit hooks
235
236pre-commit hooks are static code analysis scripts which are run (and must be passed) before
237each git commit. For this project they include pylint, ruff, mypy, black, isort.
238
239To install pre-commit hooks, do the following: (note that the second line is not necessary if
240you have already loaded the virtual environment using `poetry shell`)
241
242```
243cd $CLEV2ER_BASE_DIR
244poetry shell
245pre-commit install
246pre-commit run --all-files
247```
248
249Now, whenever you make changes to your code, it is recommended to run the following
250in your current code directory.
251
252```pre-commit run --all-files```
253
254This will check that your code passes all static code
255tests prior to running git commit. Note that these same tests are also run when
256you do a new commit, ie using `git commit -a -m "commit message"`. If the tests fail
257you must correct the errors before proceeding, and then rerun the git commit.
258
259## Developer Workflow
260
261This section describes the method that should be used to contribute to the project code.
262The basic method to develop a new feature is:
263
264On your local repository:
265
2661. Make sure your local 'master' branch is checked out and up-to-date
267   (some steps may not be necessary).
268
269    ```
270    cd $CLEV2ER_BASE_DIR
271    git checkout master
272    git pull
273    ```
274
2752. Create a new branch, named xxx_featurename, where xxx is your initials
276
277    `git checkout -b xxx_featurename`
278
2793. Develop and test your new feature within this branch, making git additions and commits
280   as necessary.
281   You should have at least one commit (probably several).
282
283   `git commit -a -m "description of change"`
284
2854. If you are developing a new module, then you must also write a pytest test
286   for that module in a tests directory located in the same directory as the module.
287   Note the section on pytest markers at the end of this document.
288
2895. Static analysis tests will be run on your changes using pre-commit, either
290   automatically during a git commit or by running in the directory of the code
291   change or in the repository base directory (for a more complete check):
292
293   `pre-commit run --all`
294
2956. Once tested, push the new feature branch to GitHub
296
297    `git push -u origin xxx_featurename`  [first time], or just `git push`
298
2997. Go to GitHub: [github.com/mssl-softeng/clev2er_liiw]
300   (https://github.com/mssl-softeng/clev2er_liiw)
301   or direct to the pull request URL shown in your git pull command.
302
3038. Create a Pull Request on GitHub for your feature branch. This will automatically start a CI
304   workflow that tests your branch for code issues and runs pytest tests. If it fails you
305   should correct the errors on your local branch and repeat (steps 3 onwards) until it passes
306   all tests.
307
3089. Finally your pull request will be reviewed and if accepted merged into the 'master' branch.
309
31010. You can then delete your local branch and the remote branch on Github.
311
312   ```
313   git branch -d xxx_featurename
314   git push origin --delete xxx_featurename
315
316   ```
317
31811. Repeat the whole process to add your next feature.
319
320
321## Framework and Chain Configuration
322
323The framework (run controller) and individual named algorithm chains each have
324separate configuration files. Configuration options can be categorized as:
325
326- run controller (or main framework ) default configuration
327- per chain default configuration (to configure individual algorithms and resources)
328- command line options (for input selection and modifying any default configuration
329  options)
330
331Chains can be configured using XML or YAML configuration files and optional command line
332options in the following order of increasing precedence:
333
334- main config file: $CLEV2ER_BASE_DIR/config/main_config.xml [Must be XML]
335- chain specific config file:
336  $CLEV2ER_BASE_DIR/config/chain_configs/*chain_name*/*config_file_name*.xml,
337  XML or .yml
338- command line options
339- command line additional config options using the --conf_opts
340
341The configurations are passed to
342the chain's algorithms and finder classes, via a merged python dictionary, available
343to the Algorithm classes as self.config.
344
345### Run Control Configuration
346
347The default run control configuration file is `$CLEV2ER_BASE_DIR/config/main_config.xml`
348
349This contains general default settings for the chain controller. Each of these can
350be overridden by the relevant command line options.
351
352| Setting | Options | Description |
353| ------- | ------- | ----------- |
354| use_multi_processing | true or false | if true multi-processing is used |
355| max_processes_for_multiprocessing | int | max number of processes to use for multi-processing |
356| use_shared_memory | true or false | if true allow use of shared memory. Experimental feature |
357| stop_on_error | true or false | stop chain on first error found, or log error and skip |
358
359### Chain Specific Configuration
360
361The default configuration for your chain's algorithms and finder classes should be placed in
362the chain specific config file:
363
364`$CLEV2ER_BASE_DIR/config/chain_configs/<chain_name>/<anyname>[.xml,.XML,or .yml]`
365
366Configuration files may be either XML(.xml) or YAML (.yml) format.
367
368#### Formatting Rules for Chain Configuration Files
369
370YAML or XML files can contain multi-level settings for key value pairs of boolean,
371int, float or str.
372
373- boolean values must be set to the string **true** or **false** (case insensitive)
374- environment variables are allowed within strings as $ENV_NAME or ${ENV_NAME} (and will be
375  evaluated)
376- YAML or XML files may have multiple levels (or sections)
377- XML files must have a top root level named *configuration*  wrapping the lower levels.
378  This is removed from the python config dictionary before being passed to the algorithms.
379- chain configuration files must have a
380    - **log_files** section to provide locations of the log files (see below)
381    - **breakpoint_files** section to provide locations of the log files (see below)
382
383Example of sections from a 2 level config file in YML:
384
385```
386# some_key: str:  description
387some_key: a string
388
389section1:
390    key1: 1
391    key2: 1.5
392    some_data_location: $MYDATA/dem.nc
393
394section2:
395    key: false
396```
397
398Example of sections from a 2 level config file in XML:
399
400```
401<?xml version="1.0"?>
402
403<!-- configuration xml level required, but removed in python dict -->
404<configuration>
405
406<!--some_key: str:  description-->
407<some_key>a string</some_key>
408
409<section1>
410   <key1>1</key1>
411   <key2>1.5</key2>
412   <some_data_location>$MYDATA/dem.nc</some_data_location>
413</section1>
414
415<section2>
416   <key>false</key>
417</section2>
418
419</configuration>
420
421```
422
423These settings are available within Algorithm classes as a python dictionary called
424**self.config** as in the following examples:
425
426```
427self.config['section1']['key1']
428self.config['section1']['some_data_location']
429self.config['some_key']
430```
431
432The config file will also be
433merged with the main run control dictionary. Settings in the chain configuration
434file will take precedence over the main run control dictionary (if they are identical), so
435you can override any main config settings in the named chain config if you want.
436
437### Required Chain Configuration Settings
438
439Each chain configuration file should contain sections to configure logging and breakpoints.
440See the section on logging below for an explanation of the settings.
441
442Here is a minimal configuration file (XML format)
443
444```
445<?xml version="1.0"?>
446<!--chain: mychain configuration file-->
447
448<configuration> <!-- note this level is removed in python dict -->
449
450<!--Setup default locations to store breakpoint files-->
451<breakpoint_files>
452    <!-- set the default directory where breakpoint files are stored -->
453    <default_dir>/tmp</default_dir>
454</breakpoint_files>
455
456<log_files>
457    <!-- default directory to store log files -->
458    <default_dir>/tmp</default_dir>
459    <!-- info_name : str : file name base str for info files -->
460    <info_name>info</info_name>
461    <!-- error_name : str : file name base str for errorfiles -->
462    <error_name>error</error_name>
463    <!-- debug_name : str : file name base str for debug files -->
464    <debug_name>debug</debug_name>
465    <!-- logname_str : str : additional string to add to end of log filename, before .log
466    Leave empty if mot required
467    -->
468    <logname_str></logname_str>
469
470    <!-- append_date_selection : true or false, if year and month are specified on
471    command line append _MMYYYY to log file base name (before .log) -->
472    <append_date_selection>true</append_date_selection>
473    <append_process_id>false</append_process_id>
474    <append_start_time>true</append_start_time>
475</log_files>
476
477<!-- add more levels and settings below here -->
478
479<resources>
480        <physical_constants>
481
482                <directory>$CLEV2ER_BASE_DIR/testdata/adf/common</directory>
483                <filename>
484CR__AX_GR_CST__AX_00000000T000000_99999999T999999_20240201T000000__________________CPOM_SIR__V01.NC
485                </filename>
486                <mandatory>True</mandatory>
487        </physical_constants>
488</resources>
489
490</configuration>
491
492```
493
494The requirement for specific settings are set by the chain and it's algorithms.
495An example of a chain configuration file can be found at:
496
497`$CLEV2ER_BASE_DIR/config/chain_configs/testchain/testchain_config.xml`
498
499For testing purposes it is sometimes useful to modify configuration settings directly
500from the command line. This can be done using the command line option --conf_opts which
501can contain a comma separated list of section:key:value pairs.
502
503An example of changing the value of the setting above would be:
504
505--conf_opts resources:mydata:${MYDATA_DIR}/somedata2.nc
506
507## Developing New Chains
508
5091. Decide on a chain name. For example **newchain**
5102. Create $CLEV2ER_BASE_DIR/algorithms/**newchain**/ directory to store the new chain's algorithms.
5113. Create $CLEV2ER_BASE_DIR/algorithms/**newchain**/tests to store the new chain's
512   algorithm unit tests (using tests formatted for pytest). At least one algorithm test file
513   should be created per algorithm, which should contain suitable test functions.
5144. Create your algorithms by copying and renaming the algorithm class template
515   $CLEV2ER_BASE_DIR/algorithms/testchain/alg_template1.py in to your algorithm directory. Each
516   algorithm
517   should have a different file name of your choice. For example: alg_retrack.py, alg_geolocate.py.
518   You need to fill in the appropriate sections of the init(), process() and finalize() functions
519   for each algorithm (see section below for more details on using algorithm classes).
5205. You must also create a test for each algorithm in
521   $CLEV2ER_BASE_DIR/algorithms/**newchain**/tests.
522   You should copy/adapt the test template
523   $CLEV2ER_BASE_DIR/algorithms/testchain/tests/test_alg_template1.py
524   for your new test.
5256. Each algorithm and their unit tests must pass the static code checks (pylint, mypy, etc) which
526   are automatically run as git pre-commit hooks.
5277. Create a first XML or YML configuration file for the chain in
528   $CLEV2ER_BASE_DIR/config/chain_configs/**newchain**/**anyname**.yml or .xml.
529   The configuration file contains any settings or resource locations that are required
530   by your algorithms, and may include environment variables.
5318. If required create one or more finder class files. These allow fine control of L1b file
532   selection from the command line (see section below for more details).
5339. Create an algorithm list YML file in
534   $CLEV2ER_BASE_DIR/config/algorithm_lists/**newchain**/**anyname**.xml (or .yml)
535   You can copy the template
536   in `$CLEV2ER_BASE_DIR/config/algorithm_lists/testchain/testchain_config.xml`
53710. To test your chain on a single L1b file, you can use
538   `run_chain.py --name newchain -f /path/to/a/l1b_file`. There are many other options for
539    running chains (see `run_chain.py -h`).
540
541## Algorithm and Finder Classes
542
543This section discusses how to develop algorithms for your chain. There are two types
544of algorithms, both of which are dynamically loaded at chain run-time.
545
546- Main algorithms : standard chain algorithm classes
547- Finder algorithms : optional classes to manage input L1b file selection
548
549### Algorithm Lists
550
551Algorithms are dynamically loaded in a chain when (and in the order ) they are named in the chain's
552algorithm list YAML or XML file:
553$CLEV2ER_BASE_DIR/config/algorithm_lists/**chainname**/**chainname**.yml,.xml.
554This has two sections (l1b_file_selectors, and algorithms) as shown in the example below:
555
556YML version:
557
558```
559# List of L1b selector classes to call in order
560l1b_file_selectors:
561  - find_lrm  # find LRM mode files that match command line options
562  - find_sin  # find SIN mode files that match command line options
563# List of main algorithms to call in order
564algorithms:
565  - alg_identify_file # find and store basic l1b parameters
566  - alg_skip_on_mode  # finds the instrument mode of L1b, skip SAR files
567  #- alg_...
568```
569
570XML version:
571
572The xml version requires an additional toplevel `<algorithm_list>` that wraps the other sections.
573It also allows you to enable or disable individual algorithms within the list by setting the
574values *Enable* or *Disable*, and to set breakpoints by setting the value to *BreakpointAfter*.
575
576```
577<?xml version="1.0"?>
578
579<algorithm_list>
580    <algorithms>
581        <alg_identify_file>Enable</alg_identify_file>
582        <alg_skip_on_mode>Enable</alg_skip_on_mode>
583        <!-- ... more algorithms -->
584        <alg_retrack>BreakpointAfter</alg_retrack>
585    </algorithms>
586
587    <l1b_file_selectors>
588        <find_lrm>Enable</find_lrm>
589        <find_sin>Enable</find_sin>
590    </l1b_file_selectors>
591</algorithm_list>
592
593```
594
595### Main Algorithms
596
597Each algorithm is implemented in a separate module located in
598
599`$CLEV2ER_BASE_DIR/src/clev2er/algorithms/<chainname>/<alg_name>.py`
600
601Each algorithm module should contain an Algorithm class, as per the algorithm
602template in:
603
604`$CLEV2ER_BASE_DIR/src/clev2er/algorithms/testchain/alg_template1.py`
605
606Please copy this template for all algorithms.
607
608Algorithm class modules have three main functions:
609
610- **init()** :  used for initializing/loading resources. Called once at the start of processing.
611- **process**(l1b:Dataset,shared_dict:dict) : called for every L1b file. The results of the
612  processing may be saved in the shared_dict, so that it can be accessed by algorithms called
613  further down the chain. The L1b data for the current file being processed is passed to this
614  function in a netcdf4 Dataset as argument l1b.
615- **finalize**() : called at the end of all processing to free resouces.
616
617All of the functions have access to the merged chain configuration dictionary **self.config**.
618
619All logging must be done using **self.log**.info(), **self.log**.error(), **self.log**.debug().
620
621#### Algorithm.process() return values
622
623It is important to note that Algorithm.**process()** return values affect how the
624chain operates. The .process() function returns (bool, str).
625
626Return values must be set as follows:
627
628- (**True**,"") when the processing has completed without errors and continuation to the
629  next algorithm in the chain (if available) is expected.
630- (**False**,"**SKIP_OK** any reason message") when the processing has found a valid reason for the
631  chain to skip any further processing of the L1b file. For example if it does not measure over the
632  target area. This will be logged as DEBUG message but is not an error. The chain will move to
633  processing the next L1b file.
634- (**False**,"some error message") : In this case the error message will be logged to the error log
635  and the file will be skipped. If **config**["chain"]["**stop_on_error**"] is False then the
636  chain will continue to the next L1b file. If **config**["chain"]["**stop_on_error**"] is True,
637  then the chain will stop.
638
639### FileFinder Classes
640
641FileFinder class modules provide more complex and tailored L1b input file selection
642than would be possible with the standard **run_chain.py** command line options of :
643
644- (**--file path**) : choose single L1b file
645- (**--dir dir**) : choose all L1b files in a flat directory
646
647FileFinder classes are only used as the file selection method if the --file and --dir
648command line options are **not** used.
649
650For example you may wish to select files using a specific search pattern, or from multiple
651directories.
652
653FileFinder classes are automatically initialized with :
654
655- **self.config** dict from the merged chain dict, any settings can be used for file selection
656- **self.months** (from command line option --month, if used)
657- **self.years** (from command line option --year, if used)
658
659FileFinder classes return a list of file paths through their .find_files() function.
660Code needs to be added to the .find_files() function to generate the file list.
661
662Any number of differently named FileFinder class modules can be specified in the algorithm list
663file,
664under the **l1b_file_selectors:** section. File lists are concatentated if more than one Finder
665class is used.
666
667An example of a FileFinder class module can be found in:
668
669`clev2er.algorithms.cryotempo.find_lrm.py`
670
671## Logging
672
673Logging within the chain is performed using the python standard logging.Logger mechanism
674but with some minor adaption to support multi-processing.
675
676Within algorithm modules, logging should be performed using the in-class Logger
677instance accessed using **self.**log :
678
679- self.log.**info**('message') : to log informational messages
680- self.log.**error**('message') : to log error messages
681- self.log.**debug**('message') : to log messages for debugging
682
683Debugging messages are only produced/saved if the chain is run in debug mode (use
684run_chain.py **--debug** command line option)
685
686### Log file Locations
687
688Info, error, and debug logs are stored in separate log files. The locations
689of the log files are set in the chain configuration file in a section called
690**log_files**. You can use environment variables in your log file paths.
691
692```
693# Default locations for log files
694log_files:
695  append_year_month_to_logname: true
696  errors: ${CT_LOG_DIR}/errors.log
697  info:   ${CT_LOG_DIR}/info.log
698  debug:  ${CT_LOG_DIR}/debug.log
699```
700
701The **append_year_month_to_logname** setting is used if the chain is
702run with the --year (and/or) --month command line args. Note that these
703command line options are passed to the optional finder classes to generate a
704list of L1b input files.
705
706If these are used and the append_year_month_to_logname setting is **true**,
707then the year and month are appended to the log file names as follows:
708
709- *logname*_*MMYYYY*.log : if both month and year are specified
710- *logname*_*YYYY*.log : if only year is used
711
712### Logging when using Multi-Processing
713
714When multi-processing mode is selected then logged messages are automatically passed
715through a pipe to a temporary file (*logfilename*.mp). This will
716contain an unordered list of messages from all processes, which is difficult
717to read directly.
718
719At the end of the chain run the multi-processing log outputs are automatically sorted
720so that messages relating to each L1b file processing are collected together
721in order. This is then merged in to the main log file.
722
723## Breakpoint Files
724
725Breakpoints can be set after any Algorithm by:
726  - setting the *BreakpointAfter* value in the chain's Algorithm list, or
727  - using the run_chain.py command line argument **--breakpoint_after** *algorithm_name*
728
729When a breakpoint is set:
730  - the chain will stop after the specified algorithm has completed for each input file.
731  - the contents of the chain's *shared_dict* will be saved as a NetCDF4 file in the
732    ```<breakpoint_dir>``` as specified in the *breakpoints:default_dir* section in the chain
733    configuration file.
734  - the NetCDF4 file will be named as ```<breakpoint_dir>/<l1b_file_name>_bkp.nc```
735  - if multiple L1b files are being processed through the chain, a breakpoint file
736    will be created for each.
737  - single values or strings in the *shared_dict* will be included as global or group
738    NetCDF attributes.
739  - if there are multiple levels in the *shared_dict* then a NetCDF group will be
740    created for each level.
741  - multi-dimensional arrays (or numpy arrays) are supported up to dimension 3.
742  - NetCDF dimension variables will not be named with physical meaning (ie time),
743    as this information can not be generically derived. Instead dimensions will be
744    named dim1, dim2, etc.
745  - all variables with the same dimension will share a common NetCDF dimension (ie dim1, etc)
746
747## Developer Notes
748
749### Code checks before committing
750
751It is recommended to run pre-commit before a `git commit`. This runs the static
752code analysis tests (isort, pylint, ruff, mypy,.. ) on your code and shows you any
753failures before you commit. The same tests are also run when you commit (and must pass).
754
755`precommit run --all`
756
757### Pytest Markers
758
759Pytest markers are setup in $CLEV2ER_BASE_DIR/pytest.ini
760
761It is important to use the correct pytest marker due to the use of GitHub CI
762workflows that run pytest on the whole repository source code. Some pytest
763tests are not suitable for GitHub CI workflow runs due to their large external
764data dependencies. These need to be marked with `pytest.mark.requires_external_data`
765so that they are skipped. These tests can be run locally where access to the data
766is available.
767
768The following Pytest markers should be used in front of relevant pytest functions:
769
770- **requires_external_data**:
771  testable on local systems with access to all external data/ADF (outside repo)
772- **non_core**:
773  used to label non-core function tests such as area plotting functions
774
775Example:
776
777```python
778@pytest.mark.requires_external_data  # not testable on GitHub due to external data
779def test_alg_lig_process_large_dem(l1b_file) -> None:
780```
781
782or placed at the top of a module:
783
784```pytestmark = pytest.mark.non_core```
785
786
787### GitHub Pages Documentation from in-code Docstrings
788
789This user manual is hosted on GitHub pages (https://mssl-softeng.github.io/clev2er_liiw)
790
791Content is created from doctrings
792(optionally containing Markdown: https://www.markdownguide.org/basic-syntax/#code )
793in the code,
794using the *pdoc* package : https://pdoc.dev
795
796Diagrams can be implemented using mermaid: https://mermaid.js.org
797
798The site is locally built in `$CLEV2ER_BASE_DIR/docs`, using a pre-commit hook
799(hook id: pdocs_build).
800Hooks are configured in `$CLEV2ER_BASE_DIR/.pre-commit-config.yaml`
801
802The hook calls the script `$CLEV2ER_BASE_DIR/pdocs_build.sh` to build the site
803whenever a `git commit` is run **in branch gh_pages**.
804
805When a `git push` is run, GitHub automatically extracts the site from the
806docs directory and publishes it.
807
808The front page of the site (ie this page) is located in the doctring within
809`$CLEV2ER_BASE_DIR/src/clev2er/__init__.py`.
810
811The docstring within `__init__.py` of each package directory should provide
812markdown to describe the directories beneath it.
813
814#### Process to Update Docs
815
816One method of updating the GitHub Pages documentation from the code (ie
817to process the docstrings in to html in the /docs folder)
818
819- Edit docstrings in master branch code (or by pull request from other branch)
820- git commit -a -m "docs update"
821- git checkout gh_pages
822- git merge master
823- pre-commit run --all (runs pdocs to update the html in docs folder)
824- git commit -a -m "docs update"
825- git push
826- git checkout master  (return to master branch)
827- git merge gh_pages
828- git push
829
830Why isn't this run automatically from the master branch or in a GitHub workflow?
831This is because pdocs (part of the pre-commit hooks) requires all code dependencies
832to be in place, including external data, when parsing/importing the code.
833External data is not available on GitHub, and also on some minimal installations
834of the master branch. So, to avoid pre-commit failing due to pdocs on other
835branches, or GitHub workflows doing the same, the docs are only updated on a
836controlled 'gh_pages' branch (which has all external data installed).
837
838
839"""