Battle for the next generation of supercomputer

2024-07-29

1877

Fujitsu in Japan is designing a 2nm processor called Monaka for the next generation of supercomputer at the RIKEN Centre that will replace the Fugaku system, which was the world’s most powerful supercomputer in 2020 with 7.6m cores.

And in the US the OakRidge National Laboratory has just issued its call for bids for OLCF-6 as the successor to the AMD-based Summit supercomputer, which currently holds the crown.

Even in January 2024, ORNL was looking at the options to upgrade the existing Frontier supercomputer system. However, it has since decided to go with an entirely new system, opening up a battle royal for suppliers.

“Over the next few years, data generation rates at experimental and observational facilities will increase by orders of magnitude due to advances in detector technology, deployment of edge sensors, and other factors,” says ORNL.

“At the same time, higher resolution simulation science will be continuing to generate data sets growing at similar rates. The post-exascale generation of [supercomputers] must be interoperable with an Integrated Research Infrastructure (IRI) to provide researchers with the ability to meld experimental control and analysis of large-scale experimental/observational datasets with high-resolution simulations and/or AI technologies.”

“With world demand for AI, data analytics and computing at all scales growing exponentially, energy utilization is both a constraint and mission driver. This has led to new technological developments in energy efficient HPC architectures, including AI accelerators, memory technologies, high-speed interconnects, systems software, and other innovations. There is a mission need to continue to make progress and lead in dramatically improving energy efficiency across the ecosystem.”

Summit will see a ‘significant increase’ in performance, particularly with power consumption, and use novel architectures, but still be ready before Summit reaches the end of its life.

Intel’s CPU roadmap sees the first 1nm devices on the Intel 10A process by the end of 2027, although the 14A-E low-power process technology may be more suitable at that stage with RibbonFET or even stacked transistors as well as backside power for higher-density devices. However, these devices will almost certainly be a complex combination of chiplets.

Similarly, the Monaka chip will provide twice the performance of the current A64FX within the same power budget, allowing a doubling of performance at the chip level. How this is converted into system performance is the issue for supercomputer makers such as HP Enterprise (HPE) and its Cray division, which constructed the previous versions.

This would put the performance at 2.1exaflops at 32bit precision and 2.2GHz with 4.85PetaBytes of memory. But a doubling of performance in eight years is not that impressive.

AMD is also likely to be on a 2nm process at TSMC with its Zen6 cores and 3D packaging.

The new element is AI, and that changes the numbers dramatically, particularly as Nvidia dominates the supply of GPUs used for AI in supercomputers.

AI is a growing workload on the Oak Ridge systems. OLCF-6 is expected to be at the forefront in supporting domain scientists and application developers as they explore and integrate transformational AI technologies to accelerate discoveries in science, energy, and security problems of national importance.

“We envision a wide spectrum of use cases ranging from inverse design and control of complex systems such as power grids and nuclear reactors, to generative AI and foundational models that integrate text and images that are often unstructured, high-resolution, and from multi-modal data sources,” it said.

However executing AI frameworks and workflows will place new demands on the system architecture, possibly requiring more interconnect bandwidth and an optimized storage layer that can handle very high rates of I/O operations (IOPS) focused on random reads.

This is going to drive the addition of memory and higher speed interconnect to the system designs.

But this also opens up new architectures for other developers.

European chip designer Tachyum has developed a universal processor that can handle HPC workloads and AI, and it has been commissioned to build supercomputers in the US. The 5nm Prodigy 1 chip has been delayed and Prodigy 2 is expected to be built on a 3nm process technology. For European sovereignty, SiPearl has been developing a multicore ARM chip now called Rhea1 for supercomputer designs in 2025. This will be used for the Jupiter supercomputer in Germany.

SiPearl is now looking at chiplet architectures to add more performance, particularly for AI, alongside the Rhea1 silicon.

And there are other data centre CPU designers that are set to scale up to supercomputers such as Ampere and Nvidia’s Grace processors, both based on ARM, which have been tested at ORNL’s Wombat lab.

The next generation of the ARM-based Grace CPU at Nvidia is going to be a significant challenge. This, or possibly its successor, will be combined with the Blackwell GPU for a superchip that will compete for the next generation of supercomputers.