Speaking at a virtual Hot Chips event, an annual gathering of processor and system architects, theyll disclose performance numbers along with other technical details for NVIDIAs first server CPU, the Hopper GPU, the most recent version of the NVSwitch interconnect chip and the NVIDIA Jetson Orin system on module (SoM).
The presentations provide fresh insights on what the NVIDIA platform will hit new degrees of performance, efficiency, scale and security.
Specifically, the talks demonstrate a design philosophy of innovating over the full stack of chips, systems and software where GPUs, CPUs and DPUs become peer processors. Together they develop a platform thats already running AI, data analytics and powerful computing jobs inside cloud providers, supercomputing centers, corporate data centers and autonomous systems.
Inside NVIDIAs First Server CPU
Data centers require flexible clusters of CPUs, GPUs along with other accelerators sharing massive pools of memory to provide the energy-efficient performance todays workloads demand.
To meet up that require, Jonathon Evans, a distinguished engineer and 15-year veteran at NVIDIA, will describe the NVIDIA NVLink-C2C. It connects CPUs and GPUs at 900 gigabytes per second with 5x the power efficiency of the prevailing PCIe Gen 5 standard, because of data transfers that consume just 1.3 picojoules per bit.
NVLink-C2C connects two CPU chips to generate the NVIDIA Grace CPU with 144 Arm Neoverse cores. Its a processor created to solve the worlds largest computing problems.
For maximum efficiency, the Grace CPU uses LPDDR5X memory. It enables a terabyte per second of memory bandwidth while keeping power consumption for the whole complex to 500 watts.
One Link, Many Uses
NVLink-C2C also links Grace CPU and Hopper GPU chips as memory-sharing peers in the NVIDIA Grace Hopper Superchip, delivering maximum acceleration for performance-hungry jobs such as for example AI training.
Anyone can build custom chiplets using NVLink-C2C to coherently hook up to NVIDIA GPUs, CPUs, DPUs and SoCs, expanding this new class of integrated products. The interconnect will support AMBA CHI and CXL protocols utilized by Arm and x86 processors, respectively.
To scale at the machine level, the brand new NVIDIA NVSwitch connects multiple servers into one AI supercomputer. It uses NVLink, interconnects running at 900 gigabytes per second, a lot more than 7x the bandwidth of PCIe Gen 5.
Alexander Ishii and Ryan Wells, both veteran NVIDIA engineers, will describe the way the switch lets users build systems with around 256 GPUs to tackle demanding workloads like training AI models which have a lot more than 1 trillion parameters.
The switch includes engines that speed data transfers utilizing the NVIDIA Scalable Hierarchical Aggregation Reduction Protocol. SHARP can be an in-network computing capability that debuted on NVIDIA Quantum InfiniBand networks. It could double data throughput on communications-intensive AI applications.
Jack Choquette, a senior distinguished engineer with 14 years at the business, will provide an in depth tour of the NVIDIA H100 Tensor Core GPU, aka Hopper.
Along with utilizing the new interconnects to scale to unprecedented heights, it packs many advanced functions that raise the accelerators performance, efficiency and security.
Hoppers new Transformer Engine and upgraded Tensor Cores deliver a 30x speedup when compared to prior generation on AI inference with the worlds largest neural network models. Also it employs the worlds first HBM3 memory system to provide an impressive 3 terabytes of memory bandwidth, NVIDIAs biggest generational increase ever.
Among other new features:
- Hopper adds virtualization support for multi-tenant, multi-user configurations.
- New DPX instructions speed recurring loops for select mapping, DNA and protein-analysis applications.
- Hopper packs support for enhanced security with confidential computing.
Choquette, among the lead chip designers on the Nintendo64 console early in his career, may also describe parallel computing techniques underlying a few of Hoppers advances.
Michael Ditty, chief architect for Orin and a 17-year tenure at the business, provides new performance specs for NVIDIA Jetson AGX Orin, an engine for edge AI, robotics and advanced autonomous machines.
It integrates 12 Arm Cortex-A78 cores and an NVIDIA Ampere architecture GPU to provide around 275 trillion operations per second on AI inference jobs. Thats around 8x greater performance at 2.3x higher energy efficiency compared to the prior generation.
The latest production module packs around 32 gigabytes of memory and is section of a compatible family that scales right down to pocket-sized 5W Jetson Nano developer kits.
All of the new chips support the NVIDIA software stack that accelerates a lot more than 700 applications and can be used by 2.5 million developers.
In line with the CUDA programming model, it offers a large number of NVIDIA SDKs for vertical markets like automotive (DRIVE) and healthcare (Clara), in addition to technologies such as for example recommendation systems (Merlin) and conversational AI (Riva).
The NVIDIA AI platform can be acquired out of every major cloud service and system maker.