GPU Linodes

Akamai offers Graphical Processing Unit (GPU)-optimized virtual machines designed to accelerate compute-intensive workloads, specifically AI inference, big data processing, and video encoding. These instances leverage specialized hardware originally engineered for high-fidelity graphics that has evolved into powerful parallel processing accelerators. By utilizing significantly more cores than a standard CPU, these GPUs manage thousands of simultaneous threads to execute the massive calculations required for modern machine learning models.

The fleet includes the latest NVIDIA RTX PRO 6000™ Blackwell Server Edition GPU, alongside the NVIDIA RTX 4000™ Ada GPU and the Quadro RTX 6000™ GPU. These cards harness CUDA, Tensor, and RT cores to deliver high-performance processing, transcoding, and ray tracing. While the Quadro RTX 6000 excels in visualization and graphics processing, the RTX PRO 6000 Blackwell Server Edition GPU is engineered for the scale and complexity of data center workloads. Its architecture is ideally suited for AI inferencing, providing the necessary throughput for large-scale model deployment.

For detailed specifications on the available hardware, refer to the documentation for NVIDIA RTX PRO 6000 Blackwell Server Edition, NVIDIA RTX 4000 Ada, and NVIDIA Quadro RTX 6000 GPU.

📘

NVIDIA RTX PRO 6000 Blackwell Server Edition and the NVIDIA Quadro RTX 6000 GPU plans have limited deployment availability.

See the Compute Service Level Agreement for legal terms that apply to Akamai features that are in limited availability or otherwise not yet released into general availability.

GPU plans are ideal for highly specialized workloads that would benefit from dedicated NVIDIA GPUs, including machine learning, AI inferencing, graphics processing, and big data analysis.

On-demand

When the costs associated with purchasing, installing, and maintaining GPUs are taken into account, the overall cost of ownership is often high. GPU Linodes allow you to leverage the power of GPUs while benefiting from the main value proposition of cloud: turning a CapEx into an OpEx.

Market leading hardware

The GPU plans use industry-leading NVIDIA GPUs with CUDA, Tensor, and RT cores in each unit. These GPUs support use cases associated with AI inferencing, parallel processing, transcoding, and ray tracing. See GPU specifications for more details.

If one GPU card isn’t enough for your projected workloads, ​Akamai​ Cloud Computing offers GPU plans with up to eight cards per Linode.

Dedicated competition-free resources

A GPU Linode's vCPU cores are dedicated, not shared between customers—they are accessible only to you. Your Linode never has to wait for another process, which enables your software to run at peak speed and efficiency. This lets you run workloads that require full-duty work (100% CPU all day, every day) at peak performance.

Recommended workloads

GPU Linodes are suitable for specialized workloads that are optimized for GPUs:

Availability

GPU PlanRegions
NVIDIA RTX PRO 6000 Blackwell Server Edition
(limited deployment availability)
Amsterdam, NL; Chennai, IN; Chicago, IL, US; Frankfurt, DE; Jakarta, ID; London, UK; Los Angeles, CA, US; Madrid, ES; Miami, FL, US; Mumbai, IN; Milan, IT; Newark, NJ, US; Osaka, JP; Paris, FR; Seattle, WA, US; Singapore, SG; Stockholm, SE; Tokyo, JP; Toronto, CA
NVIDIA RTX 4000 AdaChicago,IL, US; Frankfurt 2, DE; Osaka, JP; Paris, FR; Seattle, WA, US; Singapore, SG
NVIDIA Quadro RTX 6000
(limited deployment availability)
Atlanta, GA, US; Newark, NJ, US; Frankfurt, DE; Mumbai, IN; Singapore, SG

Plans and pricing

ResourceNVIDIA RTX PRO 6000 Blackwell Server EditionNVIDIA RTX 4000 AdaNVIDIA Quadro RTX 6000
GPU cards1-41-41-4
GPU memory (VRAM)96 GB - 384 GB20 GB - 80 GB24 GB - 96 GB
vCPU cores (dedicated)16 - 64 cores4 - 48 cores8-24 cores
Memory (RAM)176 GB - 736 GB16 GB - 196 GB32 GB - 128 GB
Storage1024 GB - 4096 GB0.5 TB - 2 TB640 GB - 2560 GB
Outbound Network Transfer0 TB0 TB16 TB - 20 TB
Outbound Network Bandwidth16 Gbps10 Gbps10 Gbp

Pricing starts at $1665/mo ($2.50/hr) for an NVIDIA RTX PRO 6000 Blackwell Server Edition GPU Linode with 1 GPU card, 16 vCPU cores, 176 GB of memory, 96 GB of video memory, and 1024 GB of SSD storage.

Pricing starts at $350/mo ($0.52/hr) for an NVIDIA RTX 4000 Ada GPU x1 Small Linode with 1 GPU card, 4 vCPU cores, 16 GB of memory, and 0.5 TB of SSD storage.

Pricing starts at $1,000/mo ($1.50/hr) for an NVIDIA Quadro RTX 6000 GPU Linode with 1 GPU card, 8 vCPU cores, 32 GB of memory, and 640 GB of SSD storage.

Review the pricing page for additional plans and their associated costs. Review the Plans page to learn more about other Linode types.

📘

In some cases, a $100 deposit may be required to deploy GPU Linodes. This may include new accounts that have been active for less than 90 days and accounts that have spent less than $100 on services. If you are unable to deploy GPU Linodes, contact Support for assistance.

GPU specifications

Each NVIDIA RTX PRO 6000 Blackwell Server Edition GPU is equipped with:

SpecificationValue
GPU Memory (VRAM)96 GB GDDR7 with ECC
CUDA Cores (Parallel-Processing)24,064
Tensor Cores (Transcoding)752 (5th Gen)
RT Cores (Ray Tracing)188 (4th Gen)
FP32 Performance120 TFLOPS

Each NVIDIA RTX 4000 Ada GPU is equipped with:

SpecificationValue
GPU Memory (VRAM)20 GB GDDR6
CUDA Cores (Parallel-Processing)6144
Tensor Cores (Transcoding)192
RT Cores (Ray Tracing)48
FP32 Performance26.7 TFLOPS

Each NVIDIA Quadro RTX 6000 GPU is equipped with:

SpecificationValue
GPU Memory (VRAM)24 GB GDDR6
CUDA Cores (Parallel-Processing)4608
Tensor Cores (Transcoding)576
RT Cores (Ray Tracing)72
FP32 Performance16.3 TFLOPS

📘

Not sure which GPU is right for your workload? See the GPU selection guide.

GPU use cases

AI inference, machine learning, and artificial intelligence

AI inference is the production phase where a trained model processes live data to deliver actionable results. This stage requires high-performance GPU acceleration to maintain the low latency and high throughput necessary for real-time applications. By utilizing specialized hardware architectures, such as NVIDIA RTX and Blackwell, infrastructure can provide the massive parallel processing power required to transform complex models into responsive, scalable services.

Machine learning applies statistical and computational techniques to large datasets in order to train models that generate predictions, classifications, and decision outputs. These models underpin recommendation systems, search relevance, fraud detection, autonomous systems, and large-scale automation. Once trained, models are deployed into production environments where they must execute rapidly, consistently, and at scale. This execution phase is referred to as AI inference.

Artificial intelligence encompasses the broader class of systems designed to exhibit intelligent behavior such as reasoning, perception, language understanding, and adaptive decision making. Modern AI systems rely heavily on accelerated compute to meet latency, throughput, and concurrency requirements, particularly for real-time and multimodal workloads. GPUs provide the parallel processing density and memory bandwidth required to efficiently serve inference workloads, batch pipelines, and streaming workloads without sacrificing predictability or performance.

GPU Linodes are optimized for high-throughput, low-latency inference at production scale. Its large GPU memory footprint, next-generation Tensor Cores, and architectural efficiency enable sustained token throughput, fast first-response latency, and high concurrency across multiple models and workloads. This makes it well suited for real-time model serving, agentic workflows, retrieval-augmented generation pipelines, and GPU-accelerated analytics where inference economics and operational predictability directly impact system viability.

Below is a representative set of common frameworks and runtimes used for machine learning development and AI inference that can be deployed on a GPU Linode:

  • TensorFlow - a free, open-source, machine learning framework, and deep learning library. Tensorflow was originally developed by Google for internal use and later fully released to the public under the Apache License.

  • PyTorch - a machine learning library for Python that uses the popular GPU optimized Torch framework.

Big data

Big data is a discipline that analyzes and extracts meaningful insights from large and complex data sets. These sets are so large and complex that they require specialized software and hardware to appropriately capture, manage, and process the data. When thinking of big data and whether or not the term applies to you, it often helps to visualize the “three Vs”:

  • Volume: Generally, if you are working with terabytes, exabytes, petabytes, or more amounts of information you are in the realm of big data.

  • Velocity: With Big Data, you’re using data that is being created, called, moved, and interacted with at a high velocity. One example is the real time data generated on social media platforms by its users.

  • Variety: Variety refers to the many different types of data formats with which you may need to interact. Photos, video, audio, and documents can all be written and saved in a number of different formats. It is important to consider the variety of data that you will collect in order to appropriately categorize it.

GPUs can help give Big Data systems the additional computational capabilities they need for ideal performance. Below are a few examples of tools which you can use for your own big data solutions:

  • Hadoop - an Apache project that allows the creation of parallel processing applications on large data sets, distributed across networked nodes.

  • Apache Spark - a unified analytics engine for large-scale data processing designed with speed and ease of use in mind.

  • Apache Storm - a distributed computation system that processes streaming data in real time.

Video encoding

Video Encoding is the process of taking a video file's original source format and converting it to another format that is viewable on a different device or using a different tool. This resource intensive task can be greatly accelerated using the power of GPUs.

  • FFmpeg - a popular open-source multimedia manipulation framework that supports a large number of video formats.

General purpose computing using CUDA

CUDA (Compute Unified Device Architecture) is a parallel computing platform and API that lets you interact more directly with the GPU for general purpose computing. In practice, this means that a developer can write code in C, C++, or many other supported languages utilizing their GPU to create their own tools and programs.

If you're interested in using CUDA on your GPU Linode, see the following resources:

Graphics processing

One of the most traditional use cases for a GPU is graphics processing. Transforming a large set of pixels or vertices with a shader or simulating realistic lighting via ray tracing are massive parallel processing tasks. Ray tracing is a computationally intensive process that simulates lights in a scene and renders the reflections, refractions, shadows, and indirect lighting. It's impossible to do on GPUs in real-time without hardware-based ray tracing acceleration. GPU Linodes offers real-time ray tracing capabilities using a single GPU.

The GPU plans support advanced shading capabilities such as:

  • Mesh shading models for vertex, tessellation, and geometry stages in the graphics pipeline
  • Variable Rate Shading to dynamically control shading rate
  • Texture-Space Shading which utilizes a private memory held texture space
  • Multi-View Rendering which allows for rendering multiple views in a single pass.