Using GPUs on LKE
Akamai's GPU Linodes are available for deployment on standard LKE clusters, enabling you to run your GPU-accelerated workloads on Akamai's managed Kubernetes service. These Linodes utilize NVIDIA RTX PRO 6000™ Blackwell Server Edition and NVIDIA RTX 4000™ Ada GPUs.
This document outlines several options for installing the NVIDIA software components needed to configure GPU-enabled workloads.
NVIDIA Quadro RTX 6000™ cannot be deployed within LKE or LKE-E clusters at this time due to limited availability.
Install NVIDIA software
There are two primary ways to install the software components needed to use NVIDIA GPUs within Kubernetes:
- NVIDIA Kubernetes device plugin: A Daemonset that manages GPUs as consumable resources and enables you to schedule workloads.
- NVIDIA GPU operator: A Kubernetes operator that automates the configuration and management of NVIDIA GPUs on Kubernetes clusters.
NVIDIA Kubernetes device plugin
This Daemonset is NVIDIA's implementation of the device plugin framework on Kubernetes and can advertise GPUs as consumable resources. The following is an example command that installs v0.17.3 of this plugin. For the latest installation instructions and versions, see the NVIDIA/k8s-device-plugin GitHub repository. You must have kubectl installed and configured to use an LKE cluster with GPU worker nodes.
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.17.3/deployments/static/nvidia-device-plugin.yml
NVIDIA GPU operator
The NVIDIA GPU operator automatically configures all of the software required to use GPUs on your cluster and worker nodes. To learn more about this operator and for the most recent instructions, see NVIDIA's Installing the NVIDIA GPU Operator guide.
Before continuing, both kubectl and Helm should be installed on your local machine. The kubectl context should be set to an LKE cluster using GPU worker nodes.
-
Add the NVIDIA Helm repository to your local machine.
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \ && helm repo update -
Install the GPU operator on your cluster.
-
LKE clusters: Since the drivers are automatically installed, they can be disabled on the operator.
helm install --wait --generate-name \ -n gpu-operator --create-namespace \ nvidia/gpu-operator \ --set driver.enabled=false \ --set toolkit.enabled=false -
LKE Enterprise clusters: The drivers are not automatically included on LKE Enterprise clusters, so they can be enabled when installing the operator.
helm install --wait --generate-name \ -n gpu-operator --create-namespace \ nvidia/gpu-operator
-
NVIDIA driver installation
When deploying GPU Linodes on LKE, the latest available NVIDIA drivers are automatically installed on non-enterprise clusters. For LKE Enterprise clusters, you can install the drivers through the operator or by running the following commands.
wget -q https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb
dpkg -i cuda-keyring_1.1-1_all.deb
apt update
apt install -y nvidia-driver-cuda linux-headers-cloud-amd64 nvidia-container-toolkit nvidia-kernel-open-dkms
nvidia-ctk runtime configure --runtime=containerd --set-as-default
Configure workloads to use GPUs
Once NIVIDIA's software has been installed, you can configure pods to consume these GPU resources. This is done by using the nvidia.com/gpu: n key-value pair within the resource limits of your workload's manifest file, where n is the number of GPUs that should be consumed. Here's an example:
---
apiVersion: v1
kind: Pod
metadata:
name: gpu-workload
spec:
containers:
- name: app
image: example-image
resources:
limits:
memory: 24Gi
cpu: 6
nvidia.com/gpu: 1
For a more in-depth example of running GPU-accelerated workloads on LKE, see the Deploy a Chatbot and RAG Pipeline for AI Inferencing on LKE guide.
Updated 3 days ago
