Skip to content

Setting up a single-session LFADS run

Assuming you have finished adapting the LFADS run manager classes to your dataset, you should be all set to generate some LFADS runs and start training. We’ll be setting up a drive script that will do the work of creating the appropriate instances, pointing at the datasets, creating the runs, and telling LFADS Run Manager to generate the files needed for LFADS. Below, we’ll refer to the package name as LorenzExperiment, but you should substitute this with your package name.

Follow along with LorenzExperiment.drive_script

A complete drive script is available as a starting point in +LorenzExperiment/drive_script.m for you to copy/paste from.

Lorenz attractor example

For this demo, we’ll generate a few datasets of synthetic spiking data generated by a Lorenz attractor using the following code:

datasetPath = '~/lorenz_example/datasets';
LFADS.Utils.generateDemoDatasets(datasetPath, 'nDatasets', 3);

This will simulate a chaotic 3 dimensional Lorenz attractor as the underlying dynamical system, initialized from 65 initial conditions. Here is a subset of 10 conditions’ trajectories:

Lorenz trajectories for 10 conditions

From these 3 dimensions, we generate random matrices along which to project these 3 dimensions to produce the firing rates of individual units (plus a constant bias term). The initial conditions (defining the conditions) and subsequent dynamical trajectories are the same across datasets. Each dataset will contain a variable number of neurons (between 25–35). The rates of these neurons are then constructed by projecting the 3-d Lorenz trajectory through a dataset-specific readout matrix, adding the bias, and exponentiating. We then draw spikes from the inhomogenous Poisson process for 20-30 trials for each condition.

Here are a few examples of single trial spike rasters. The units have been sorted according to their loading onto the first dimension of the attractor:

Example single trial spike rasters

Building a dataset collection and adding datasets

First, create a dataset collection that points to a folder on disk where datasets are stored:

dataPath = '~/lorenz_example/datasets';
dc = LorenzExperiment.DatasetCollection(dataPath);
dc.name = 'lorenz_example';

Then, we can add the individual datasets within based on their individual paths. Note that when a new dataset instance is created, it is automatically added to the DatasetCollection and will replace any dataset that has the same name if present.

LorenzExperiment.Dataset(dc, 'dataset001.mat');

You can verify that the datasets have been added to the collection:

>> dc
LorenzExperiment.DatasetCollection "lorenz_example"
  1 datasets in ~/lorenz_example/datasets
  [ 1] LorenzExperiment.Dataset "dataset001"

         name: 'lorenz_example'
      comment: ''
         path: '~/lorenz_example/datasetss'
     datasets: [1x1 LorenzExperiment.Dataset]
    nDatasets: 1

You can access individual datasets using dc.datasets(1) or by name with dc.matchDatasetsByName('dataset001').

You can then load all of the metadata for the datasets using:

dc.loadInfo();

How this metadata is determined for each dataset may be customized as described in Interfacing with your Datasets. You can view a summary of the metadata using:

>> dc.getDatasetInfoTable          

                  subject                  date             saveTags    nTrials    nChannels
              ________________    ______________________    ________    _______    _________

dataset001    'lorenz_example'    [31-Jan-2018 00:00:00]    '1'         1820       35

Create a RunCollection

We’ll now setup a RunCollection that will contain all of the LFADS runs we’ll be training. Inside this folder will be stored all of the processed data and LFADS output, nicely organized within subfolders.

runRoot = '~/lorenz_example/runs';
rc = LorenzExperiment.RunCollection(runRoot, 'exampleSingleRun', dc);

% replace with approximate date script authored as YYYYMMDD
% to ensure forwards compatibility
rc.version = 20180131;

Versioning and backwards compatibility

You can optionally set rc.version just after creating the RunCollection. Version should be set to the date the script was first used to generate the LFADS files on disk, in the format YYYYMMDD. Specifying this here allows for backwards compatibility in case we need to change aspects of where LFADS Run Manager organizes files on disk or how the RunParams hashes are generated. The default rc.version will be updated if significant changes are made in the code, so manually specifying it in the drive script can be useful to “freeze” the LFADS Run Manager logic for this specific collection of runs.

Specify the hyperparameters in RunParams

We’ll next specify a single set of hyperparameters to begin with. Since this is a simple dataset, we’ll reduce the size of the generator network to 64 and reduce the number of factors to 8.

par = LorenzExperiment.RunParams;
par.name = 'first_attempt'; % completely optional
par.spikeBinMs = 2; % rebin the data at 2 ms
par.c_co_dim = 0; % no controller --> no inputs to generator
par.c_batch_size = 150; % must be < 1/5 of the min trial count
par.c_factors_dim = 8; % and manually set it for multisession stitched models
par.c_gen_dim = 64; % number of units in generator RNN
par.c_ic_enc_dim = 64; % number of units in encoder RNN
par.c_learning_rate_stop = 1e-3; % we can stop training early for the demo

Setting batch size

The number of trials in your smallest dataset determines the largest batch size you can pick. If trainToTestRatio is 4 (the default), then you will need at least 4+1 = 5 times as many trials in every dataset as c_batch_size. If you choose a batch size which is too large, LFADS Run Manager will generate an error to alert you.

We then add this RunParams to the RunCollection:

rc.addParams(par);

You can access the parameter settings added to rc using rc.params, which will be an array of RunParams instances. The RunParams class will display all of the settings in an organized manner, as well as a summary of those values that differ from their defaults at the top:

>> par

par =

LorenzExperiment.RunParams param_YOs74u data_4MaTKO "first_attempt"
c_learning_rate_stop=0.001 c_batch_size=150 c_co_dim=0 c_ic_enc_dim=64 c_gen_dim=64 c_factors_dim=8

   Computed hashes
                          paramHash: 'YOs74u'
                    paramHashString: 'param_YOs74u'
                           dataHash: '4MaTKO'
                     dataHashString: 'data_4MaTKO'

   Run Manager logistics and data processing
                               name: 'first_attempt'
                            version: 20171107
                         spikeBinMs: 2

   TensorFlow Logistics
                 c_allow_gpu_growth: 1
                 c_max_ckpt_to_keep: 5
             c_max_ckpt_to_keep_lve: 5
                           c_device: '/gpu:0'

   Optimization
               c_learning_rate_init: 0.0100
       c_learning_rate_decay_factor: 0.9800
       c_learning_rate_n_to_compare: 6
               c_learning_rate_stop: 1.0000e-03
                    c_max_grad_norm: 200
                   trainToTestRatio: 4
                       c_batch_size: 150
                  c_cell_clip_value: 5

   Overfitting
      c_temporal_spike_jitter_width: 0
                        c_keep_prob: 0.9500
                     c_l2_gen_scale: 500
                     c_l2_con_scale: 500
               c_co_mean_corr_scale: 0

   Underfitting
                     c_kl_ic_weight: 1
                     c_kl_co_weight: 1
                    c_kl_start_step: 0
                c_kl_increase_steps: 900
                    c_l2_start_step: 0
                c_l2_increase_steps: 900
     scaleIncreaseStepsWithDatasets: 1

   External inputs
                    c_ext_input_dim: 0
          c_inject_ext_input_to_gen: 0

   Controller and inferred inputs
                           c_co_dim: 0
                    c_prior_ar_atau: 10
           c_do_train_prior_ar_atau: 1
                    c_prior_ar_nvar: 0.1000
           c_do_train_prior_ar_nvar: 1
             c_do_causal_controller: 0
    c_do_feed_factors_to_controller: 1
        c_feedback_factors_or_rates: 'factors'
             c_controller_input_lag: 1
                       c_ci_enc_dim: 128
                          c_con_dim: 128
               c_co_prior_var_scale: 0.1000

   Encoder and initial conditions for generator
             c_num_steps_for_gen_ic: 4294967295
                           c_ic_dim: 64
                       c_ic_enc_dim: 64
                 c_ic_prior_var_min: 0.1000
               c_ic_prior_var_scale: 0.1000
                 c_ic_prior_var_max: 0.1000
                  c_ic_post_var_min: 1.0000e-04

   Generator network, factors, rates
                c_cell_weight_scale: 1
                          c_gen_dim: 64
      c_gen_cell_input_weight_scale: 1
        c_gen_cell_rec_weight_scale: 1
                      c_factors_dim: 8
                      c_output_dist: 'poisson'

   Stitching multi-session models
                  c_do_train_readin: 1
                 useAlignmentMatrix: 0
    useSingleDatasetAlignmentMatrix: 0

   Posterior sampling
                posterior_mean_kind: 'posterior_sample_and_average'
              num_samples_posterior: 512

RunParams data and param hashes

If we look at the printed representation of the RunParams instance, we see two hash values:

>> par

par =

LorenzExperiment.RunParams param_YOs74u data_4MaTKO
c_factors_dim=8 c_ic_enc_dim=64 c_gen_dim=64 c_co_dim=0 c_batch_size=150 c_learning_rate_stop=0.001
...

These six digit alphanumeric hash values are used to uniquely and concisely identify the runs so that they can be conveniently located on disk in a predictable fashion. The first is the “param” hash of the whole collection of parameter settings which differ from their defaults, which is prefixed with param_. The second is a hash of only those parameter settings that affect the input data used by LFADS, prefixed by data_. We use two separate hashes here to save space on disk; many parameters like c_co_dim only affect LFADS internally, but the input data is the same. Consequently, generating a large sweep of parameters like c_co_dim would otherwise require many copies of identical data to be saved on disk. Instead, we store the data in folders according to the data_ hash and symlink copies for each run. If you add additional parameters that do not affect the data used by LFADS, you should specify them in your RunParams class as described here.

Below the hash values are the set of properties whose values differ from their specified defaults (as specified next to the property in the class definition). Properties which are equal to their default values are not included in the hash calculation. This allows you to add new properties to your RunParams class without altering the computed hashes for older runs. See this warning note for more details.

RunParams is a value class

Unlike all of the other classes, RunParams is not a handle but a value class, which acts similarly to a struct in that it is passed by value. This means that after adding the RunParams instance par to the RunCollection, we can modify par and then add it again to define a second set of parameters, like this:

par.c_gen_dim = 96;
rc.addParams(par);
par.c_gen_dim = 128;
rc.addParams(par);

Generating hyperparameter value sweeps

If you wish to sweep a specific property or set of properties, you can create a RunParams instance, set the other properties as needed, and then call generateSweep to build an array of RunParams instances:

parSet = par.generateSweep('c_gen_dim', [32 64 96 128]);
rc.addParams(parSet);

Or along multiple parameters in a grid:

parSet = par.generateSweep('c_gen_dim', [32 64 96 128], 'c_co_dim', 0:2:4);
rc.addParams(parSet);

Specify the RunSpec

Recall that RunSpec instances specify which datasets are included in a specific run. For this example, we’ve only included a single dataset, so we don’t have any choices to make. We’ll run LFADS on first dataset by itself:

ds_index = 1;
runSpecName = dc.datasets(ds_index).getSingleRunName(); % generates a simple run name from this datasets name
runSpec = LorenzExperiment.RunSpec(runSpecName, dc, ds_index);
rc.addRunSpec(runSpec);

You can adjust the arguments to the constructor of LorenzExperiment.RunSpec, but in the example provided the inputs define:

  • the unique name of the run. Here we use getSingleRunName, a convenience method of Dataset that generates a name like single_datasetName.
  • the DatasetCollection from which datasets will be retrieved
  • the indices or names of datasets (as a string or cell array of strings) to include

Check the RunCollection

The RunCollection will now display information about the parameter settings and run specifications that have been added. Here there is only one parameter setting by one run specification, so we’re only performing 1 run total.

>> rc

LorenzExperiment.RunCollection "exampleSingleSession" (1 runs total)
  Dataset Collection "lorenz_example" (1 datasets) in ~/lorenz_example/datasets
  Path: ~/lorenz_example/runs/exampleSingleSession

  1 parameter settings
  [1 param_YOs74u data_4MaTKO] LorenzExperiment.RunParams "first_attempt" c_factors_dim=8 c_ic_enc_dim=64 c_gen_dim=64 c_co_dim=0 c_batch_size=150 c_learning_rate_stop=0.001

  1 run specifications
  [ 1] LorenzExperiment.RunSpec "single_dataset001" (1 datasets)

                          name: 'exampleSingleSession'
                       comment: ''
                      rootPath: '~/lorenz_example/runs'
                       version: 20180131
             datasetCollection: [1x1 LorenzExperiment.DatasetCollection]
                          runs: [1x1 LorenzExperiment.Run]
                        params: [1x1 LorenzExperiment.RunParams]
                      runSpecs: [1x1 LorenzExperiment.RunSpec]
                       nParams: 1
                     nRunSpecs: 1
                    nRunsTotal: 1
                     nDatasets: 1
                  datasetNames: {'dataset001'}
                          path: '~/lorenz_example/runs/exampleSingleSession'
      pathsCommonDataForParams: {'~/lorenz_example/runs/exampleSingleSession/data_4MaTKO'}
                pathsForParams: {'~/lorenz_example/runs/exampleSingleSession/param_YOs74u'}
    fileShellScriptTensorboard: '~/lorenz_example/runs/exampleSingleSession/launch_tensorboard.sh'
               fileSummaryText: '~/lorenz_example/runs/exampleSingleSession/summary.txt'
       fileShellScriptRunQueue: '~/lorenz_example/runs/exampleSingleSession/run_lfadsqueue.py'

Prepare for LFADS

Now that you’ve set up your run collection with all of your runs, you can run the following to generate the files needed for running LFADS.

rc.prepareForLFADS();

This will generate files for all runs. If you decide to add new runs, by adding additional run specifications or parameters, you can simply call prepareForLFADS again. Existing files won’t be overwritten unless you call rc.prepareForLFADS(true).

After running prepareForLFADS, the run manager will create the following files on disk under rc.path:

~/lorenz_example/runs/exampleSingleSession
├── data_4MaTKO
│   └── single_dataset001
│       ├── inputInfo_dataset001.mat
│       └── lfads_dataset001.h5
├── param_YOs74u
│   └── single_dataset001
│       └── lfadsInput
│           ├── inputInfo_dataset001.mat -> ../../../data_4MaTKO/single_dataset001/inputInfo_dataset001.mat
│           └── lfads_dataset001.h5 -> ../../../data_4MaTKO/single_dataset001/lfads_dataset001.h5
└── summary.txt

The organization of these files on disk is discussed in more detail here. Also, a summary.txt file will be generated which can be useful for identifying all of the runs and their locations on disk. You can also generate this text from within Matlab by calling rc.generateSummaryText().

LorenzExperiment.RunCollection "exampleSingleSession" (1 runs total)
  Path: ~/lorenz_example/runs/exampleSingleSession
  Dataset Collection "lorenz_example" (1 datasets) in ~/lorenz_example/datasets

  ------------------------

  1 Run Specifications:

    [runSpec 1] LorenzExperiment.RunSpec "single_dataset001" (1 datasets)
      [ds 1] LorenzExperiment.Dataset "dataset001"

  ------------------------

  1 Parameter Settings:

    [1 param_YOs74u data_4MaTKO] LorenzExperiment.RunParams "first_attempt" c_learning_rate_stop=0.001 c_batch_size=150 c_co_dim=0 c_ic_enc_dim=64 c_gen_dim=64 c_factors_dim=8

         spikeBinMs: 2
         c_allow_gpu_growth: true
         c_max_ckpt_to_keep: 5
         c_max_ckpt_to_keep_lve: 5
         c_device: /gpu:0
         c_learning_rate_init: 0.01
         c_learning_rate_decay_factor: 0.98
         c_learning_rate_n_to_compare: 6
         c_learning_rate_stop: 0.001
         c_max_grad_norm: 200
         trainToTestRatio: 4
         c_batch_size: 150
         c_cell_clip_value: 5
         c_temporal_spike_jitter_width: 0
         c_keep_prob: 0.95
         c_l2_gen_scale: 500
         c_l2_con_scale: 500
         c_co_mean_corr_scale: 0
         c_kl_ic_weight: 1
         c_kl_co_weight: 1
         c_kl_start_step: 0
         c_kl_increase_steps: 900
         c_l2_start_step: 0
         c_l2_increase_steps: 900
         scaleIncreaseStepsWithDatasets: true
         c_ext_input_dim: 0
         c_inject_ext_input_to_gen: false
         c_co_dim: 0
         c_prior_ar_atau: 10
         c_do_train_prior_ar_atau: true
         c_prior_ar_nvar: 0.1
         c_do_train_prior_ar_nvar: true
         c_do_causal_controller: false
         c_do_feed_factors_to_controller: true
         c_feedback_factors_or_rates: factors
         c_controller_input_lag: 1
         c_ci_enc_dim: 128
         c_con_dim: 128
         c_co_prior_var_scale: 0.1
         c_num_steps_for_gen_ic: 4294967295
         c_ic_dim: 64
         c_ic_enc_dim: 64
         c_ic_prior_var_min: 0.1
         c_ic_prior_var_scale: 0.1
         c_ic_prior_var_max: 0.1
         c_ic_post_var_min: 0.0001
         c_cell_weight_scale: 1
         c_gen_dim: 64
         c_gen_cell_input_weight_scale: 1
         c_gen_cell_rec_weight_scale: 1
         c_factors_dim: 8
         c_output_dist: poisson
         c_do_train_readin: true
         useAlignmentMatrix: false
         useSingleDatasetAlignmentMatrix: false
         posterior_mean_kind: posterior_sample_and_average
         num_samples_posterior: 512