Green Node Logo
 
NVIDIA GPUs

NVIDIA DGX GH200: Decoding the Language of Massive Memory

Feb 06, 2024

nvidia-dgx-gh200-decoding-the-language-of-massive-memory
 

During COMPUTEX 2023, NVIDIA unveiled the NVIDIA DGX GH200, a groundbreaking advancement in GPU-accelerated computing designed to handle the most demanding AI workloads. This announcement not only delves into crucial aspects of the NVIDIA DGX GH200 architecture but also explores the capabilities of NVIDIA Base Command, streamlining rapid deployment, expediting user onboarding, and simplifying system management.

To empower scientists tackling these complex challenges, NVIDIA introduced the NVIDIA Grace Hopper Superchip, coupled with the NVLink Switch System, uniting up to 256 GPUs within an NVIDIA DGX GH200 system. This configuration grants the DGX GH200 system access to 144 terabytes of memory through the GPU-shared memory programming model at high speeds over NVLink.

In comparison to a single NVIDIA DGX A100 320 GB system, the NVIDIA DGX GH200 offers an astounding nearly 500x increase in memory available to the GPU shared memory programming model over NVLink, effectively forming a colossal GPU-equipped data center. Notably, the NVIDIA DGX GH200 achieves a historic milestone by becoming the first supercomputer to surpass the 100-terabyte barrier for memory accessible to GPUs over NVLink. 

nvidia-dgx-gh200-decoding-the-language-of-massive-memory
Advancements in NVLink technology lead to increased GPU memory capacity

The Architectural Framework of the NVIDIA DGX GH200 System

The foundational components of the NVIDIA DGX GH200 architecture consist of the NVIDIA Grace Hopper Superchip and the NVLink Switch System. The NVIDIA Grace Hopper Superchip integrates the Grace and Hopper architectures through NVIDIA NVLink-C2C, establishing a coherent memory model for both CPU and GPU. This innovative approach enhances connectivity and efficiency. The NVLink Switch System, leveraging the fourth generation of NVLink technology, extends NVLink connections across superchips, creating a seamless, high-bandwidth, multi-GPU system.

Within the NVIDIA DGX GH200, each NVIDIA Grace Hopper Superchip boasts 480 GB LPDDR5 CPU memory, offering a power efficiency of an eighth compared to DDR5, along with 96 GB of high-speed HBM3. The NVIDIA Grace CPU and Hopper GPU are interconnected using NVLink-C2C, delivering 7 times more bandwidth than PCIe Gen5 while consuming only one-fifth of the power.

The NVLink Switch System forms a two-level, non-blocking, fat-tree NVLink fabric to fully connect 256 Grace Hopper Superchips in a DGX GH200 system. This comprehensive interconnectivity ensures that every GPU in the DGX GH200 can access the memory of other GPUs, including the extended GPU memory of all NVIDIA Grace CPUs, operating at a remarkable speed of 900 GBps.

The compute baseboards hosting the Grace Hopper Superchips are linked to the NVLink Switch System through a custom cable harness, establishing the first layer of the NVLink fabric. LinkX cables then extend this connectivity in the second layer of the NVLink fabric, completing the intricate architecture of the NVIDIA DGX GH200. 

nvidia-dgx-gh200-decoding-the-language-of-massive-memory
Topology of a fully connected NVIDIA NVLink Switch System in NVIDIA DGX GH200 with 256 GPUs

In the DGX GH200 system, GPU threads can access peer HBM3 and LPDDR5X memory from other Grace Hopper Superchips in the NVLink network using an NVLink page table. NVIDIA Magnum IO acceleration libraries optimize GPU communications for efficiency, enhancing application scaling across all 256 GPUs.

Each Grace Hopper Superchip in DGX GH200 is coupled with one NVIDIA ConnectX-7 network adapter and one NVIDIA BlueField-3 NIC. The DGX GH200 boasts a bi-section bandwidth of 128 TBps and 230.4 TFLOPS of NVIDIA SHARP in-network computing, accelerating collective operations common in AI. It effectively doubles the NVLink Network System's bandwidth by minimizing communication overheads in collective operations.

For scalability beyond 256 GPUs, ConnectX-7 adapters can interconnect multiple DGX GH200 systems, creating an even larger solution. The power of BlueField-3 DPUs transforms any enterprise computing environment into a secure and accelerated virtual private cloud, enabling organizations to run application workloads securely in multi-tenant environments.

What Makes DGX GH200 Different

Target Applications and Performance Benefits

The significant advancement in GPU memory enhances the performance of AI and HPC applications that were previously constrained by GPU memory size. Many mainstream AI and HPC workloads can now fully reside in the collective GPU memory of a single NVIDIA DGX H100, making it the most performance-efficient training solution for such tasks.

However, for more demanding workloads, such as a deep learning recommendation model with terabytes of embedded tables, a terabyte-scale graph neural network training model, or large data analytics tasks, the DGX GH200 demonstrates notable speedups of 4x to 7x. This underscores the DGX GH200 as the preferred solution for advanced AI and HPC models that require extensive GPU-shared memory programming.

nvidia-dgx-gh200-decoding-the-language-of-massive-memory
Benchmarking Performance in Giant Memory AI Workloads

Tailoring for The Most Challenging Workloads

Each component in the DGX GH200 is meticulously chosen to minimize bottlenecks, optimizing network performance for crucial workloads and fully leveraging the scale-up hardware capabilities. This meticulous selection results in linear scalability and efficient utilization of the extensive shared memory space.

To maximize the potential of this advanced system, NVIDIA has also engineered an exceptionally high-speed storage fabric that operates at peak capacity. This fabric efficiently manages diverse data types - such as text, tabular data, audio, and video - simultaneously and consistently delivers high performance.

Comprehensive NVIDIA Solution

DGX GH200 is equipped with NVIDIA Base Command, encompassing an AI workload-optimized operating system, a cluster manager, and libraries that enhance compute, storage, and network infrastructure, all tailored for the DGX GH200 system architecture.

Additionally, DGX GH200 incorporates NVIDIA AI Enterprise, offering a comprehensive set of software and frameworks meticulously optimized to simplify AI development and deployment. This end-to-end solution empowers customers to concentrate on innovation, alleviating concerns about the intricacies of managing their IT infrastructure.

nvidia-dgx-gh200-decoding-the-language-of-massive-memory
The NVIDIA DGX GH200 AI supercomputer comprehensive includes NVIDIA Base Command and NVIDIA AI Enterprise

Use Case: JUPITER Supercomputer – The Power of NVIDIA DGX GH200 at Scale

Europe’s first exascale AI supercomputer, JUPITER, marks a turning point in high-performance computing. At the heart of this achievement is the NVIDIA DGX GH200, designed to meet the massive memory and compute demands of next-generation AI and scientific workloads.

Built for Exascale AI and Scientific Research

Developed by Forschungszentrum Jülich in collaboration with NVIDIA, JUPITER integrates more than 24,000 Grace Hopper Superchips (GH200) connected through NVIDIA’s NVLink Switch System. This architecture delivers over 1 exaFLOP of FP8 AI performance and provides close to one petabyte of unified GPU memory, forming one of the world’s largest shared-memory compute systems.

With this capacity, JUPITER can handle workloads that were previously limited by fragmented memory, such as training trillion-parameter foundation models, processing climate simulations, or running large-scale molecular research.

Why DGX GH200 Is Central to JUPITER’s Design

Traditional GPU clusters rely on PCIe-based communication, which often becomes a bottleneck when scaling model size or data complexity. The DGX GH200 solves this challenge by combining the Grace CPU and Hopper GPU within a single coherent architecture. Each node features NVLink-C2C interconnects and 144 terabytes of shared memory, allowing data to move freely between processors without the need for CPU offloading.

This unified memory design simplifies distributed training, reduces latency, and improves energy efficiency, making it possible to train extremely large models at scale. It also allows research teams to focus on model innovation rather than managing complex data movement between nodes.

Real-World Impact

By deploying the NVIDIA DGX GH200 platform at this scale, JUPITER gives Europe sovereign computing capacity for climate modeling, molecular dynamics, generative AI, and other data-intensive scientific applications. The project demonstrates a new direction for high-performance computing where memory architecture becomes as important as raw GPU power.

Key takeaway: JUPITER proves that the NVIDIA DGX GH200 Supercomputer is more than a hardware upgrade. It represents a shift toward memory-centric design, where unified access to massive data enables breakthroughs in AI and scientific research.

Also Read: NVIDIA H100 vs H200: Key Differences in Performance, Specs, and AI Workloads

Final Thoughts

At the forefront of delivering this exceptional supercomputer is GreenNode. Committed to making the DGX GH200 accessible, GreenNode is your gateway to harnessing the power of this first-of-its-kind system. With a dedication to advancing technology and overcoming the most complex challenges, GreenNode paves the way for a future where groundbreaking achievements in AI and HPC become more achievable than ever. Explore the possibilities with GreenNode and embark on a journey of unprecedented computational prowess. Learn more here.

Frequently Asked Questions about NVIDIA DGX GH200

1. What makes the NVIDIA DGX GH200 unique?

The NVIDIA DGX GH200 combines Grace Hopper Superchips with a shared memory architecture of up to 144 terabytes. This creates a unified memory space between CPU and GPU, reducing data transfer overhead and allowing developers to train trillion-parameter AI models and run exascale simulations efficiently.

2. How is DGX GH200 different from DGX H100?

While the DGX H100 is a GPU-only system, the DGX GH200 integrates both Grace CPUs and Hopper GPUs using NVLink-C2C interconnects. This makes CPU and GPU memory fully coherent and accessible as one pool.

  • DGX H100: Up to 960 GB of GPU memory.
  • DGX GH200: Up to 144 TB of unified CPU-GPU memory.

This difference enables GH200 systems to handle larger models with lower latency and higher throughput.

3. What workloads benefit most from NVIDIA DGX GH200?

The DGX GH200 excels at workloads that depend on large memory and high bandwidth, including:

  • Large language models (LLMs) and generative AI
  • Graph neural networks (GNNs)
  • Scientific and climate simulations
  • Large-scale recommendation systems and analytics

Is NVIDIA DGX GH200 available in the cloud?

Yes. The DGX GH200 is available through NVIDIA DGX Cloud, giving enterprises and research institutions access to exascale GPU compute power without building physical infrastructure. This makes it easier to train, fine-tune, and deploy large AI models in cloud environments.

Can existing CUDA and AI frameworks run on DGX GH200?

Yes. The DGX GH200 maintains full compatibility with the CUDA programming model and NVIDIA AI Enterprise ecosystem. Developers can use existing frameworks such as PyTorch, TensorFlow, and JAX without modification. The unified memory and NVLink fabric deliver performance improvements automatically, without rewriting code.

How power-efficient is the DGX GH200?

Thanks to the Grace Hopper architecture, the DGX GH200 delivers excellent performance per watt. Each node uses liquid cooling and direct NVLink communication to reduce power loss, providing sustainable efficiency even under heavy workloads. This makes it well-suited for data centers focused on green computing.

How does DGX GH200 shape the future of AI compute?

The NVIDIA DGX GH200 Supercomputer signals a move toward memory-driven computing. Instead of relying solely on GPU speed, future systems will focus on unified, scalable memory access that supports increasingly large and complex models. This approach enables faster research cycles, more accurate simulations, and greater efficiency across enterprise AI and HPC workloads.

Tags:

Read more