Tutorial

Here we provide a complete example of how to run the framework, including how to implement a custom Exploration Strategy (ES), and generate/interpret analysis.

Installation

First, fork the QMLA codebase from [QMLA] to a Github user account (referred to as username in the following code snippet). Now, we must download the code base and ensure it runs properly; these instructions are implemented via the command line. Notes:

  1. these instructions are tested for Linux and presumed to work on Mac, but untested on Windows. It is likely some of the underlying software (redis servers) can not be installed on Windows, so running on Windows Subsystem for Linux is advised.

  2. Python development tools are required by some packages: if the pip install -r requirements fail, here are some possible solutions.

  3. Here we recommend using a virtual environment to manage the QMLA ecosystem; a resource for managing virtual environments is virtualenvwrapper. If using virtualenvwrapper, generate and activate a venv and disregard step 2 below.

  4. In the following installation steps, ensure to replace python3.6 with your preferred Python version. Python3.6 (or above) is preferred.

The steps of preparing the codebase are

  1. install redis

  2. create a virtual Python environment for installing QMLA dependencies without damaging other parts of the user’s environment

  3. download the [QMLA] codebase from the forked Github repository

  4. install packages upon which QMLA depends.

# Install redis (database broker)
sudo apt update
sudo apt install redis-server

# Ensure access to python dev tools
sudo apt-get install python3.6-dev

# make directory for QMLA
cd
mkdir qmla_test
cd qmla_test

# make Python virtual environment for QMLA
# note: change Python3.6 to desired version
sudo apt-get install python3.6-venv
python3.6 -m venv qmla-env
source qmla-env/bin/activate

# Download QMLA (!! REPLACE username !!)
git clone --depth 1 https://github.com/username/QMLA.git

# Install dependencies
# Note some packages demand others are installed first,
# so are in a separate file.
cd QMLA
pip install -r requirements.txt
pip install -r requirements_further.txt

Note there may be a problem with some packages in the arising from the attempt to install them all through a single call to pip install. Ensure these are all installed before proceeding. When all of the requirements are installed, test that the framework runs. QMLA uses databases to store intermittent data: we must manually initialise the database. Run the following (note: here we list redis-4.0.8, but this must be corrected to reflect the version installed on the user’s machine in the above setup section):

~/redis-4.0.8/src/redis-server

which should give something like Fig. 1.

Terminal running :code:`redis-server`.

Fig. 1 Terminal running redis-server.

In a text editor, open QMLA/launch/local_launch.sh, the script used to run the codebase; here we will ensure that we are running the algorithm, with 5 experiments and 20 particles, on the ES named TestInstall. Ensure the first few lines of read:

#!/bin/bash

##### -------------------------------------------------- #####
# QMLA run configuration
##### -------------------------------------------------- #####
num_instances=2 # number of instances in run
run_qhl=0 # perform QHL on known (true) model
run_qhl_multi_model=0 # perform QHL for defined list of models
experiments=2 # number of experiments
particles=10 # number of particles
plot_level=5


##### -------------------------------------------------- #####
# Choose an exploration strategy
# This will determine how QMLA proceeds.
##### -------------------------------------------------- #####
exploration_strategy="TestInstall"

Ensure the terminal running redis is kept active, and open a separate terminal window. We must activate the Python virtual environment configured for QMLA, which we set up above. Then, navigate to the QMLA directory, and launch:

# activate the QMLA Python virtual environment
source qmla_test/qmla-env/bin/activate

# move to the QMLA directory
cd qmla_test/QMLA
# Run QMLA
cd launch
./local_launch.sh

There may be numerous warnings, but they should not affect whether QMLA has succeeded; QMLA will any raise significant error. Assuming the run has completed successfully, QMLA stores the run’s results in a subdirectory named by the date and time it was started. For example, if the was initialised on January \(1^{st}\) at 01:23, navigate to the corresponding directory by

cd results/Jan_01/01_23

For now it is sufficient to notice that the code has run successfully: it should have generated (in Jan_01/01_23) files like storage_001.p and results_001.p.

Custom exploration strategy

Next, we design a basic ES, for the purpose of demonstrating how to run the algorithm. Exploration strategies are placed in the directory qmla/exploration_strategies. To make a new one, navigate to the exploration strategies directory, make a new subdirectory, and copy the template file.

cd ~/qmla_test/QMLA/exploration_strategies/
mkdir custom_es

# Copy template file into example
cp template.py custom_es/example.py
cd custom_es

Ensure QMLA will know where to find the ES by importing everything from the custom ES directory into to the main module. Then, in the directory, make a file called which imports the new ES from the file. To add any further exploration strategies inside the directory custom_es, include them in the custom __init__.py, and they will automatically be available to QMLA.

# inside qmla/exploration_strategies/custom_es
#  __init__.py
from qmla.exploration_strategies.custom_es.example import *

# inside qmla/exploration_strategies, add to the existing
# __init__.py
from qmla.exploration_strategies.custom_es import *

Now, change the structure (and name) of the ES inside custom_es/example.py. Say we wish to target the true model

(1)\[ \begin{align}\begin{aligned}\vec{\alpha} = \left( \alpha_{1,2} \ \ \ \ \alpha_{2,3} \ \ \ \ \alpha_{3,4} \right)\\\vec{T} = \left( \hat{\sigma}_{z}^1 \otimes \hat{\sigma}_{z}^2 \ \ \ \ \hat{\sigma}_{z}^2 \otimes \hat{\sigma}_{z}^3 \ \ \ \ \hat{\sigma}_{z}^3 \otimes \hat{\sigma}_{z}^4 \right)\\\begin{split}\Longrightarrow \hat{H}_{0} = \hat{\sigma}_{z}^{(1,2)} \hat{\sigma}_{z}^{(2,3)} \hat{\sigma}_{z}^{(3,4)} \\\end{split}\end{aligned}\end{align} \]

QMLA interprets models as strings, where terms are separated by +, and parameters are implicit. So the target model in (1) will be given by

pauliSet_1J2_zJz_d4+pauliSet_2J3_zJz_d4+pauliSet_3J4_zJz_d4

Adapting the template ES slightly, we can define a model generation strategy with a small number of hard coded candidate models introduced at the first branch of the exploration tree. We will also set the parameters of the terms which are present in \(\hat{H}_{0}\), as well as the range in which to search parameters. Keeping the import``s at the top of the ``example.py, rewrite the ES as:

class ExampleBasic(
    exploration_strategy.ExplorationStrategy
):

    def __init__(
        self,
        exploration_rules,
        true_model=None,
        **kwargs
    ):
        self.true_model = 'pauliSet_1J2_zJz_d4+pauliSet_2J3_zJz_d4+pauliSet_3J4_zJz_d4'
        super().__init__(
            exploration_rules=exploration_rules,
            true_model=self.true_model,
            **kwargs
        )

        self.initial_models = None
        self.true_model_terms_params = {
            'pauliSet_1J2_zJz_d4' : 2.5,
            'pauliSet_2J3_zJz_d4' : 7.5,
            'pauliSet_3J4_zJz_d4' : 3.5,
        }
        self.tree_completed_initially = True
        self.min_param = 0
        self.max_param = 10

    def generate_models(self, **kwargs):

        self.log_print(["Generating models; spawn step {}".format(self.spawn_step)])
        if self.spawn_step == 0:
            # chains up to 4 sites
            new_models = [
                'pauliSet_1J2_zJz_d4',
                'pauliSet_1J2_zJz_d4+pauliSet_2J3_zJz_d4',
                'pauliSet_1J2_zJz_d4+pauliSet_2J3_zJz_d4+pauliSet_3J4_zJz_d4',
            ]
            self.spawn_stage.append('Complete')

        return new_models

To run the example ES for a meaningful test, return to the local_launch.sh script above, but change some of the settings:

particles=2000
experiments=500
run_qhl=1
exploration_strategy=ExampleBasic

Run locally again then move to the results directory as in as in Installation. Note this will take up to 15 minutes to run. This can be reduced by lowering the values of particles, experiments, which is sufficient for testing but note that the outcomes will be less effective than those presented in the figures of this section.

Analysis

QMLA stores results and generates plots over the entire range of the algorithm, i.e. the run, instance and models. The depth of analysis performed automatically is set by the user control plot_level in local_launch.sh; for plot_level=1 , only the most crucial figures are generated, while plot_level=5 generates plots for every individual model considered. For model searches across large model spaces and/or considering many candidates, excessive plotting can cause considerable slow-down, so users should be careful to generate plots only to the degree they will be useful. Next we show some examples of the available plots.

Model analysis

We have just run QHL for the model in (1) for a single instance, using a reasonable number of particles and experiments, so we expect to have trained the model well. Instance-level results are stored (e.g. for the instance with qmla_id=1) in Jan_01/01_23/instances/qmla_1. Individual models’ insights can be found in , e.g. the model’s leaning_summary (Fig. 2), and in dynamics (Fig. 3).

Learning summary

Fig. 2 The outcome of QHL for the given model. Subfigures (a)-(c) show the estimates of the parameters. (d) shows the total parameterisation volume against experiments trained upon, along with the evolution times used for those experiments.

_images/dynamics_1.png

Fig. 3 The model’s attempt at reproducing dynamics from \(\hat{H}_0\).

Instance analysis

Now we can run the full QMLA algorithm, i.e. train several models and determine the most suitable. QMLA will call the method of the ES, set in Installation, which tells QMLA to construct three models on the first branch, then terminate the search. Here we need to train and compare all models so it takes considerably longer to run: for the purpose of testing, we reduce the resources so the entire algorithm runs in about 15 minutes. Some applications will require significantly more resources to learn effectively. In realistic cases, these processes are run in parallel, as we will cover in Parallel implementation.

Reconfigure a subset of the settings in the local_launch.sh script and run it again:

experiments=250
particles=1000
run_qhl=0
exploration_strategy=ExampleBasic

In the corresponding results directory, navigate to instances/qmla_1, where instance level analysis are available.

cd results/Jan_01/01_23/instances/qmla_1

Figures of interest here show the composition of the models (Fig. 4), as well as the BF between candidates (Fig. 5). Individual model comparisons – i.e. BF – are shown in Fig. 6, with the dynamics of all candidates shown in Fig. 7. The probes used during the training of all candidates are also plotted (Fig. 8).

_images/composition_of_models.png

Fig. 4 composition_of_models: constituent terms of all considered models, indexed by their model IDs. Here model 3 is \(\hat{H}_0\)

_images/bayes_factors.png

Fig. 5 bayes_factors: comparisons between all models are read as \(B_{i,j}\) where \(i\) is the model ID on the y-axis and \(j\) on the x-axis. Thus \(B_{ij} > 0 \ (<0)\) indicates \(\hat{H}_i$ \ ($\hat{H}_j\)), i.e. the model on the y-axis (x-axis) is the stronger model.

_images/BF_1_3.png

Fig. 6 comparisons/BF_1_3: direct comparison between models with IDs 1 and 3, showing their reproduction of the system dynamics (red dots, \(Q\), as well as the times (experiments) against which the BF was calculated.

_images/dynamics_branch_1.png

Fig. 7 branches/dynamics_branch_1: dynamics of all models considered on the branch compared with system dynamics (red dots, \(Q\))

_images/probes_bloch_sphere.png

Fig. 8 probes_bloch_sphere: probes used for training models in this instance (only showing 1-qubit versions).

Run analysis

Considering a number of instances together is a run. In general, this is the level of analysis of most interest: an individual instance is liable to errors due to the probabilistic nature of the model training and generation subroutines. On average, however, we expect those elements to perform well, so across a significant number of instances,we expect the average outcomes to be meaningful.

Each results directory has an script to generate plots at the run level.

cd results/Jan_01/01_23
./analyse.sh

Run level analysis are held in the main results directory and several sub-directories created by the script. For testing, here we recommend running a number of instances with very few resources so that the test finishes quickly (about ten minutes). The results will therefore be meaningless, but allow for elucidation of the resultant plots. First, reconfigure some settings of local_launch.sh and launch again.

num_instances=10
experiments=20
particles=100
run_qhl=0
exploration_strategy=ExampleBasic

Some of the generated analysis are shown in the following figures. The number of instances for which each model was deemed champion, i.e. their win rates are given in Fig. 9. The top models, i.e. those with highest win rates, analysed further: the average parameter estimation progression for \(\hat{H}_{0}\) – including only the instances where \(\hat{H}_{0}\) was deemed champion – are shown in Fig. 10. Irrespecitve of the champion models, the rate with which each term is found in the champion model (\(\hat{t} \in \hat{H}^{\prime}\)) indicates the likelihood that the term is really present; these rates – along with the parameter values learned – are shown in Fig. 11. The champion model from each instance can attempt to reproduce system dynamics: we group together these reproductions for each model in Fig. 12.

_images/model_wins.png

Fig. 9 performace/model_wins: number of instance wins achieved by each model.

_images/params_pauliSet_1J2_zJz_d4+pauliSet_2J3_zJz_d4+pauliSet_3J4_zJz_d4.png

Fig. 10 champion_models/params_params_pauliSet_1J2_zJz_d4+pauliSet_2J3_zJz_d4+pauliSet_3J4_zJz_d4: parameter estimation progression for the true model, only for the instances where it was deemed champion.

_images/terms_and_params.png

Fig. 11 champion_models/terms_and_params: histogram of parameter values found for each term which appears in any champion model, with the true parameter (\(\alpha_0\)) in red and the median learned parameter (\(\bar{\alpha}^{\prime}\)) in blue.

_images/dynamics.png

Fig. 12 performance/dynamics: median dynamics of the champion models. The models which won most instances are shown together in the top panel, and individually in the lower panels. The median dynamics from the models’ learnings in its winning instances are shown, with the shaded region indicating the 66% confidence region.

Parallel implementation

We provide utility to run QMLA on parallel processes. Individual models’ training can run in parallel, as well as the calculation of BF between models. The provided script is designed for PBS job scheduler running on a compute cluster. It will require a few adjustments to match the system being used. Overall, though, it has mostly a similar structure as the script used above.

QMLA must be downloaded on the compute cluster as in Installation; this can be a new fork of the repository, though it is sensible to test installation locally as described in this chapter so far, then push that version, including the new ES, to Github, and cloning the latest version. It is again advisable to create a Python virtual environment in order to isolate QMLA and its dependencies (indeed this is sensibel for any Python development project). Open the parallel launch script, QMLA/launch/parallel_launch.sh, and prepare the first few lines as

#!/bin/bash

##### -------------------------------------------------- #####
# QMLA run configuration
##### -------------------------------------------------- #####
num_instances=10 # number of instances in run
run_qhl=0 # perform QHL on known (true) model
run_qhl_multi_model=0 # perform QHL for defined list of models
experiments=250
particles=1000
plot_level=5


##### -------------------------------------------------- #####
# Choose an exploration strategy
# This will determine how QMLA proceeds.
##### -------------------------------------------------- #####
exploration_strategy="ExampleBasic"

When submitting jobs to schedulers like PBS, we must specify the time required, so that it can determine a fair distribution of resources among users. We must therefore estimate the time it will take for an instance to complete: clearly this is strongly dependent on the numbers of experiments (\(N_e\)) and particles (\(N_p\)), and the number of models which must be trained. QMLA attempts to determine a reasonable time to request based on the max_num_models_by_shape attribute of the ES, by calling QMLA/scripts/time required calculation.py. In practice, this can be difficult to set perfectly, so the attribute of the ES can be used to correct for heavily over- or under-estimated time requests. Instances are run in parallel, and each instance trains/compares models in parallel. The number of processes to request, \(N_c\) for each instance is set as in the ES. Then, if there are \(N_r\) instances in the run, we will be requesting the job scheduler to admit \(N_r\) distinct jobs, each requiring \(N_c\) processes, for the time specified.

The parallel_launch script works together with QMLA/launch/run_single_qmla_instance.sh, though note a number of steps in the latter are configured to the cluster and may need to be adapted. In particular, the first command is used to load the redis utility, and later lines are used to initialise a redis server. These commands will probably not work with most machines, so must be configured to achieve those steps.

module load tools/redis-4.0.8

...

SERVER_HOST=$(head -1 "$PBS_NODEFILE")
let REDIS_PORT="6300 + $QMLA_ID"

cd $LIBRARY_DIR
redis-server RedisDatabaseConfig.conf --protected-mode no --port $REDIS_PORT &
redis-cli -p $REDIS_PORT flushall

When the modifications are finished, QMLA can be launched in parallel similarly to the local version:

source qmla_test/qmla-env/bin/activate

cd qmla_test/QMLA/launch
./parallel_launch.sh

Jobs are likely to queue for some time, depending on the demands on the job scheduler. When all jobs have finished, results are stored as in the local case, in QMLA/launch/results/Jan_01/01_23, where can be used to generate a series of automatic analyses.

Customising exploration strategies

User interaction with the QMLA codebase should be achieveable primarily through the exploration strategy framework. Throughout the algorithm(s) available, QMLA calls upon the ES before determining how to proceed. The usual mechanism through which the actions of QMLA are directed, is to set attributes of the ES class: the complete set of influential attributes are available at ExplorationStrategy.

QMLA directly uses several methods of the ES class, all of which can be overwritten in the course of customising an ES. Most such methods need not be replaced, however, with the exception of , which is the most important aspect of any ES: it determines which models are built and tested by QMLA. This method allows the user to impose any logic desired in constructing models; it is called after the completion of every branch of the exploration tree on the ES.