Ada Lovelace

NVIDIA L40S

High-throughput inference and rendering with 48 GB GDDR6X

VRAM

48 GB

Bandwidth

864 GB/s

FP16

733 TFLOPS

TDP

350W

NVIDIA L40S

Technical Specifications

VRAM 48 GB GDDR6X
Memory Bandwidth 864 GB/s
FP16 Performance 733 TFLOPS
BF16 Performance 733 TFLOPS
FP32 Performance 91.6 TFLOPS
INT8 Performance 1,466 TOPS
TDP 350W
Form Factor PCIe Gen4 Dual-Slot
PCIe Interface PCIe Gen4 x16
Max GPUs per Server Up to 8

Prices vary with supply and import costs. Contact for current India pricing.

Best For

LLM inference for 70B parameter models (quantized)
Generative AI serving (Stable Diffusion, image generation)
3D rendering and real-time ray tracing with RT cores
Video transcoding and AI-powered media pipelines

Not Ideal For

  • Large-scale training (GDDR6X bandwidth is lower than HBM)
  • Multi-node training clusters requiring NVLink

Overview

The NVIDIA L40S is a versatile Ada Lovelace GPU that bridges inference, rendering, and generative AI workloads. With 48 GB of GDDR6X memory, it can handle large model inference (including LLaMA 70B with 4-bit quantization) while also providing hardware ray tracing via RT cores.

For inference, the L40S delivers strong price-to-performance, particularly for models in the 7B-70B parameter range. Its 48 GB VRAM exceeds the L4 (24 GB) and costs significantly less than an H100. For organizations deploying generative AI applications, the L40S is often the sweet spot.

The L40S also excels in professional visualization. VFX studios, architectural visualization firms, and animation pipelines benefit from its fourth-generation RT cores and support for NVIDIA Omniverse. If your workload mixes inference with rendering, the L40S eliminates the need for separate GPU pools.

Get NVIDIA L40S pricing for your setup

Tell us your workload and cluster size. We'll quote the complete solution including servers, networking, and colocation.