Lambda Labs for GPU Model Training

What is Lambda Labs

Lambda Labs is a cloud computing company specialising in machine learning development. They are particularly useful for accessing on demand GPU instances, although they can also be used to host more complex clusters and private clouds. With a few clicks we can get setup and launch a high performance GPU.

There are a number of alternatives to Lambda Labs, including AWS SageMaker and Modal

What will you need?

To follow along with this you will need to:

Setup your billing information in lambda labs

Have a Github account to pull your codebase from

Getting Started

Setup SSH Key

If you have an existing SSH key, this process is as simple as copy & pasting your local key in to the Add SSH Key section and giving your key a name.

If not you can hit ‘Generate a new SSH key’ and enter a name for your key, then click Create. This key should be automatically downloaded to your Downloads or wherever your default download location is. You will need to move this key to your ~/.ssh directory. Once this is setup you should be ready to launch an instance.

Launch an Instance

Launching an instance is quite simple, just click the launch instance button

Choose your instance (I would recommend one of the cheaper ones to get started), then on the next page select your region.

Optionally, we can attach a file system here. This is particularly useful for persisting system packages or python environments so you do not need to install these fresh every time. I would not recommend storing large datasets in your file system as this feature if charged per GB user hourly.

Finally we need to attach an SSH key to the instance and hit launch. The instance should take between 30-60s to be ready for use.

Once the instance is ready you need to make note of the instance IP address which you can then use to SSH into the machine:


ssh -i '<SSH-KEY-FILE-PATH>' ubuntu@<INSTANCE-IP>

Clone github repo

If you would like to retain this repo in the filesystem so you do not need to clone it each time you connect to an instance you should first change directory into you file system you created cd filesystem then we will clone the repo:


git clone https://github.com/<username>/<repo-name>.git

Optional: Python Install and Venv

If we want to install a newer version of python and isolate out python environment


sudo apt update
sudo apt install python3.11 python3.11-venv

# make the venv
python3.11 -m venv modeltrain
# activate
source modeltrain/bin/activate
# check
which python
/home/ubuntu/modeltrain/bin/python

Then we can install our python dependencies in this environment


pip install -r requirements.txt

Optional: Copying files between local and remote

Once we have a instance running and a filesystem attached we can use rsync to copy files between our local and remote instance. For example to copy a Dockerfile from my local instance to the remove instance, I simply need to run the below command in my local terminal:


rsync -av <local_file> ubuntu@<remote_ip_address>:<remote_path>/<remote_file_name>

For example, copying my Dockerfile to my remote with ip addresss 129.213.24.67 and a file system called model-training:


rsync -av Dockerfile ubuntu@129.213.24.67:model-training/Dockerfile

To copy from the remote to local is a similar process running the below command in your local terminal:


rsync -avz -e ssh ubuntu@<remote_ip_address>:<remote_path>/<remote_file_name> <local_file>

Using the same example as above:


rsync -avz -e ssh ubuntu@129.213.24.67:model-training/Dockerfile Dockerfile

Optional: Using Docker for reproducible dev environments

We can use Docker containers to create reproducible dev environments with the required setup and dependencies preinstalled and ready to go. We can do this using the Dev Containers plugin and VS Code.

For example, if we have a repo with our code base we can define our Dockerfile so that it installs our system dependencies (in this case Poetry for managing our python env) and install our Python dependencies (mostly Torch and Cuda related libraries)


FROM python:3.11-buster as builder

RUN pip install poetry==1.8.2

ENV POETRY_NO_INTERACTION=1 \
    POETRY_CACHE_DIR=/tmp/poetry_cache

WORKDIR /app

COPY pyproject.toml poetry.lock ./

RUN --mount=type=cache,target=$POETRY_CACHE_DIR poetry install --no-root

We can then define our .devcontainer.json file


{
	"name": "Python Dev Container",
	"dockerFile": "Dockerfile",
	"extensions": [
		"ms-python.python",
		"ms-toolsai.jupyter",
		"ms-python.vscode-pylance",
	],
	"settings": {
	  "python.defaultInterpreterPath": "/usr/local/bin/python",
	  "python.linting.enabled": true,
	  "python.linting.pylintEnabled": true,
	  "python.formatting.autopep8Path": "/usr/local/py-utils/bin/autopep8",
	  "python.formatting.blackPath": "/usr/local/py-utils/bin/black",
	}
  }

Then in VS code we can SSH into our machine using the Remote - SSH extension. We can then use the Dev Containers: Reopen in Container command from the Command Palette to open our container in our remote machine.