

Running LFADS¶

To train the LFADS model using Python+Tensorflow, you need to generate shell scripts that will actually call run_lfads.py and do the work of training the model. lfads-run-manager provides two ways to go about this.

Add the run_lfads.py folder to your shell PATH

Be sure that the LFADS python source folder is on your shell path, such that running which run_lfads.py prints the directory where the Python+Tensorflow code LFADS is located. If not, you’ll need to run something like export PATH=$PATH:/path/to/models/research/lfads and consider adding this to your .bashrc file.

If Matlab is able to determine the location of run_lfads.py (meaning that it’s own inherited PATH was set correctly), it will prepend an export PATH=... statement to each generated shell script for you. If not, you can try calling setenv('PATH', '...') from within Matlab to add run_lfads.py to the path. before generating the shell scripts.

Alternatively, you can hard-code the location to run_lfads.py by passing along the fully specified path to each of the writeShellScript... methods as 'path_run_lfads_py', '/path/to/run_lfads.py'

Virtualenv support

Each of the methods below supports a 'virtualenv', 'environmentName' parameter-value argument. If specified, a source activate environmentName will be prepended to each script that calls Python for you. This is needed when Tensorflow is installed inside a virtual environment (or a conda virtual environment).

LFADS Queue: Automatically queueing many runs¶

If you wish to run each LFADS model manually at the command line, skip ahead. However, manually running each of these shell scripts in sequence can be tedious, especially if you don’t have enough GPUs or CPUs to run them all in parallel and individual runs take hours or days to complete. To make this part of the process more complete, you can alternatively use the LFADS Queue model queueing system which will take care of training all the LFADS models for you.

Only supported on Linux

Unfortunately, this task queueing system is not supported on Mac OS at the moment, primarily because it depends on nvidia-smi, though it’s theoretically possible with cuda-smi with light code changes. However, Tensorflow has discontinued explicit GPU support on Mac OS anyway. This has also never been tested on Windows, as you’d need to get tmux working.

First, we’ll generate the Python script from Matlab that enumerates all of the runs:

rc.writeShellScriptRunQueue('display', 0, 'virtualenv', 'tensorflow');

Optional parameters include:

display:: The numeric value of the X display to target. 0 means target display DISPLAY=:0. A display is needed to draw plots using matplotlib. If you’re running on a VM, you may want to launch a VNC Server and point at that display. By default the display will be set according to the DISPLAY environment variable as it is seen inside tmux.
gpuList:: List of GPU indices to include in the queue. By default, this will include all GPUs detected by nvidia-smi.
runIdx:: Scalar indices of all runs to include in the queue. By default this will include all runs in .runs.
virtualenv:: String indicating the virtual environment to source before launching the Python LFADS task, where TensorFlow must be installed.
rerun:: By default, any run which already has an lfads.done file in the directory will be skipped, allowing you to regenerate and rerun the queue script whenever new runs are added. If rerun is true, all runs will be executed, although the old LFADS output checkpoint will be used during training. If you want to re-train from scratch, you’ll need to delete the lfadsOutput directories, or call rc.deleteLFADSOutput().
oneTaskPerGPU:: By default, only one LFADS model will be trained per GPU, as empirically we’ve found that the switching costs outweigh any benefit from running multiple models simultaneously on each GPU. If you set this to false, ensure that you’ve set c_allow_gpu_growth to true in the RunParams.
gpuMemoryRequired: Estimated maximum MB of GPU RAM needed per model, used to schedule models onto GPUs when oneTaskPerGPU is false.
maxTasksSimultaneously: A manual cap on the number of models to train simultaneously. This is only relevant when oneTaskPerGPU is false, and will default to the number of CPUs minus one.
prependPathToLFADSQueue: If true, automatically appends the path to lfadsqueue.py to the PYTHONPATH inside the generated script. Defaults to false to avoid confusion.

This will generate a Python script run_lfads.py, which for our example can be launched via:

python ~/lorenz_example/runs/exampleSingleSession/run_lfadsqueue.py

Run Manager src folder should be added to your PYTHONPATH

The run_lfadsqueue.py script depends on lfadsqueue.py, which lives in lfads-run-manager/src. You should add this to your PYTHONPATH or request that it be added to your PYTHONPATH environment variable in the run_lfadsqueue.py script by setting prependPathToLFADSQueue to true.

Install and configure tmux

The LFADS queue launches each LFADS run inside its own tmux session to make it easy to monitor the runs as they are running. You’ll need to install tmux.

Also, tmux is finnicky about environment variables, which are only loaded when the tmux server first launches, not when a new session is started. The main one you need is that run_lfads.py must be on your PATH somewhere. If Matlab is able to determine this location (meaning that it’s own inherited PATH was set correctly), it will prepend an export PATH=... statement to each lfads_train.sh script for you. If not, you can try calling setenv('PATH', '...') from within Matlab to add run_lfads.py to the path. before generating the shell scripts.

If you’re having trouble, you might want to launch a new tmux session using:

tmux new-session

Then from inside tmux, test that which run_lfads.py prints a location and that you are able to launch python and run import tensorflow as tf without any issues.

You can then kick everything off by running python run_lfadsqueue.py at the command line. It’s recommended to do this from inside your own tmux session if you’re running on a remote server, so you can monitor the task runner.

Python virtual environments

If tensorflow is installed in a Python virtual environment, you can have this environment be automatically activated via source activate within the training scripts using:

rc.writeShellScriptRunQueue('virtualenv', 'tensorflow');

A few notes on how the system works:

Output from Python will be tee‘d into lfads.out, so you can check the output during or afterwards either there or in the tmux session.
When a model finishes training and posterior mean sampling, a file called lfads.done will be created
If the task runner detects an lfads.done file, it will skip that run. Unless you pass 'rerun', true to writeShellScriptRunQueue, in which case every run will be rerun. This is convenient if you’ve added additional runs and just want the new ones to run.
If a run fails, the error will be printed by the task runner and lfads.done will not be created
A running tally of how many runs are currently running, have finished, or have failed will be printed
You can enter a run’s tmux session directly to monitor it. The list of sessions can be obtained using tmux list-sessions. You can also abort it using Ctrl-C and it will be marked as failed by the task runner.
If you Ctrl-C the run_lfadsqueue.py script itself, the already launched runs will continue running. If you want to abort them, you can pkill python although this will kill all python processes you’ve created. In either case, you should be able to relaunch the run_lfadsqueue.py script and have it pick up where it left off as well.

The run_lfadsqueue.py script will periodically output updates about how the runs are proceeding:

(tensorflow) ➜  python run_lfadsqueue.py
Warning: tmux sessions will be nested inside the current session
Queue: Launching TensorBoard on port 42561 in tmux session exampleRun_tensorboard_port42561
bash /home/djoshea/lorenz_example/runs/exampleSingleSession/launch_tensorboard.sh --port=42561
Queue: Initializing with 2 GPUs and 12 CPUs, max 4 simultaneous tasks
Task lfads_param_Qr2PeG__single_dataset001: launching on gpu 0
Task lfads_param_Qr2PeG__single_dataset001: started in tmux session lfads_param_Qr2PeG__single_dataset001 on GPU 0 with PID 19498
Task lfads_param_Qr2PeG__single_dataset002: launching on gpu 1
Task lfads_param_Qr2PeG__single_dataset002: started in tmux session lfads_param_Qr2PeG__single_dataset002 on GPU 1 with PID 19527
Task lfads_param_Qr2PeG__single_dataset003: launching on gpu 0
Task lfads_param_Qr2PeG__single_dataset003: started in tmux session lfads_param_Qr2PeG__single_dataset003 on GPU 0 with PID 19551
Task lfads_param_Qr2PeG__all: launching on gpu 1
Task lfads_param_Qr2PeG__all: started in tmux session lfads_param_Qr2PeG__all on GPU 1 with PID 19585
Task lfads_param_Qr2PeG__single_dataset003:      Decreasing learning rate to 0.009800.
Task lfads_param_Qr2PeG__single_dataset001:      Decreasing learning rate to 0.009800.
Task lfads_param_Qr2PeG__single_dataset001:      Decreasing learning rate to 0.009604.
Task lfads_param_Qr2PeG__single_dataset003:      Decreasing learning rate to 0.009604.
Task lfads_param_Qr2PeG__single_dataset003:      Decreasing learning rate to 0.009412.
Task lfads_param_Qr2PeG__single_dataset001:      Decreasing learning rate to 0.009412.

As the tasks run, the task queue will print out messages related to decreasing the learning rate, which is one way to measure ongonig progress towards the termination criterion (when the learning rate hits c_learning_rate_stop). When a task fails or completes, the queue will print out a running tally.

Note that TensorBoard has automatically been launched on an available port, here on 42561. You can also directly attach to the tmux sessions whose names are indicated in the script as “Tasks”, which can be listed using tmux list-sessions.

(tensorflow) ➜ tmux list-sessions
matlab: 4 windows (created Tue Oct  3 21:51:49 2017) [201x114] (attached)
exampleRun_tensorboard_port42561: 1 windows (created Fri Oct  6 14:43:16 2017) [201x113]
lfads_param_Qr2PeG__all: 1 windows (created Fri Oct  6 14:43:17 2017) [201x113]
lfads_param_Qr2PeG__single_dataset001: 1 windows (created Fri Oct  6 14:43:16 2017) [201x114]
lfads_param_Qr2PeG__single_dataset002: 1 windows (created Fri Oct  6 14:43:16 2017) [201x113]
lfads_param_Qr2PeG__single_dataset003: 1 windows (created Fri Oct  6 14:43:17 2017) [201x113]

If you wish to abort ongoing runs, you can either attach to them directly and use Ctrl-C, or use tmux kill-session SESSIONNAME. When everything has completed, you’ll see something like this:

Task lfads_param_Qr2PeG__all: Stopping optimization based on learning rate criteria.
Task lfads_param_Qr2PeG__all: completed successfully
Queue: All tasks completed.
Queue: 0 skipped, 4 finished, 0 failed, 0 running

Launching each run individually from shell scripts¶

Follow these instructions to run each model individually, but you’ll probably prefer to queue everything at once.

Training the model¶

The first is to manually generate shell scripts for each run and then run them yourself. First, for each run i, you will call:

rc.runs(i).writeShellScriptLFADSTrain('cuda_visible_devices', 0, 'display', 0);

Here, you should specify options that will be written into the shell script, the key ones being:

cuda_visible_devices - which GPU index to run this model on, e.g. 0. Use the nvidia-smi to enumerate the available GPUs on your system
display - the X display to use, e.g. 0, which will set DISPLAY to :0. The python code generates plots during training that will appear in TensorBoard. Generating these plots requires a display. When running in a remote server, you’ll need to specify this, and possibly to launch an X server using something like tightvnc or vncserver.
appendPosteriorMeanSample - true or false specifying whether to chain the posterior mean sampling operation after the training is finished. The default is false, but if you set this to true, you won’t need to call writeShellScriptPosteriorMeanSample below.
appendWriteModelParams - true or false specifying whether to chain the posterior mean sampling operation after the training is finished. The default is false, but if you set this to true, you won’t need to call writeShellScriptWriteModelParams below.

This will generate an lfads_train.sh in the corresponding run’s folder. For the first run in our example, this is at

~/lorenz_example/runs/exampleSingleSession/param_Qr2PeG/single_dataset001/lfads_train.sh

The script essentially launches Python to run run_lfads.py with the specific parameters you’ve indicated in RunParams and pointing at the corresponding datasets, which were saved earlier when we called rc.prepareForLFADS.

#!/bin/bash

path_to_run_lfads=$(which run_lfads.py)
if [ ! -n "$path_to_run_lfads" ]; then
    echo "Error: run_lfads.py not found on PATH. Ensure you add LFADS to your system PATH."
    exit 1
fi

DISPLAY=:0 CUDA_VISIBLE_DEVICES=0 python $(which run_lfads.py) --data_dir=/home/djoshea/lorenz_example/runs/exampleSingleSession/param_YOs74u/single_dataset001/lfadsInput --data_filename_stem=lfads --lfads_save_dir=/home/djoshea/lorenz_example/runs/exampleSingleSession/param_YOs74u/single_dataset001/lfadsOutput --cell_clip_value=5.000000 --factors_dim=8 --ic_enc_dim=64 --ci_enc_dim=128 --gen_dim=64 --keep_prob=0.950000 --learning_rate_decay_factor=0.980000 --device=/gpu:0 --co_dim=0 --do_causal_controller=false --do_feed_factors_to_controller=true --feedback_factors_or_rates=factors --controller_input_lag=1 --do_train_readin=true --l2_gen_scale=500.000000 --l2_con_scale=500.000000 --batch_size=150 --kl_increase_steps=900 --l2_increase_steps=900 --ic_dim=64 --con_dim=128 --learning_rate_stop=0.001000 --temporal_spike_jitter_width=0 --allow_gpu_growth=true --kl_ic_weight=1.000000 --kl_co_weight=1.000000 --inject_ext_input_to_gen=false

Running the lfads_train.sh script will launch the Tensorflow training which will take some time. You likely want to launch this in a tmux session if running remotely.

Sampling the posterior means¶

Next, generate the lfads_posterior_mean_sample.sh script to sample the posterior means, which can be launched after training has completed. If you set appendPosteriorMeanSample to true in writeShellScriptLFADSTrain, you can skip this step.

rc.runs(i).writeShellScriptLFADSPosteriorMeanSample('cuda_visible_devices', 0);

Writing the model parameters¶

Lastly, we want to export the trained model parameters to disk as an HD5 file. We do this by generating the shell script using

rc.runs(i).writeShellScriptWriteModelParams('cuda_visible_devices', 0);

If you set appendWriteModelParams to true in writeShellScriptLFADSTrain, you can skip this step. These results will be written to a file called lfadsOutput/model_params, though these results can be loaded into Matlab using run.loadModelTrainedParams().

Launching Tensorboard¶

You can monitor the progress of each run by generating a script that launches TensorBoard.

rc.writeTensorboardShellScript();

This will create launch_tensorboard.sh which will launch Tensorboard which can then be visited at http://localhost:PORT.