The AI burden will be met by datacentres. Can they cope?

By Nicholas Cole, Data Centre Solution Manager, EXFO.

For all the attention being given to AI making things easier and automating tasks, there’s a great deal of work that goes into creating AI tools. It’s estimated that 80 percent of that work relates to collecting and preparing data. According to IBM, some businesses have embarked on AI projects only to give in after a year spent gathering and cleaning data and having nothing to show for it.

AI is not only data-hungry, it’s also energy-hungry and component-hungry. If the current rate of growth continues, the AI industry will use as much energy as the whole of the Netherlands, or around half a per cent of our total global electricity consumption—according to a study from the VU Amsterdam School of Business and Economics. The burden of maintaining the infrastructure to support the AI revolution will fall to datacentres, an integral part of the data economy in terms of storing, managing and processing data.

Datacentres will be the unsung heroes of AI if they can handle the increasing levels of data that generative AI creates, relying on datacentre operators to manage demands effectively to support this new technology. Without datacentre evolution to meet growing demands, they could become a bottleneck and slow progress.

The new data surge

AI is just one of the changes driving the increase in data flows—with hybrid working, more connected devices, and video streaming just some examples of others—but if some predictions hold true, AI will be the change with the biggest effect. We’ve already seen a big increment in the amount of data used every day. According to McKinsey, 2025 will see the world produce around 463 exabytes of data every day, or 463,000,000 terabytes, which constitutes over 150 times as much as we collectively produced a decade ago.

When people think about the technology required to support increasing amounts of data, it’s usually in terms of capacity of transport networks, or wireless technologies like 5G. Often datacentres are forgotten, despite the key role they will play in the ongoing evolution of AI. Yet datacentre growth can be seen through their power consumption, increasing each year by 10%, and projected to reach 35GW by 2030, up from 17GW in 2022—a figure that does not account for the power efficiencies achieved by the IT industry in recent years.

The accelerated growth of AI tools, and their reliance on huge data sets, will put a great deal of pressure on datacentres to rethink datacentre infrastructure and the services customers will expect in the future.

Building, upgrading and testing network infrastructure With major cloud and internet service firms leading the way Synergy research forecast hyperscale datacentre capacity to almost triple in next six years. This will be realized through adding to existing sites and building new ones, with Synergy citing 427 projects in the pipe.

These new datacentres will set the blueprint for the AI era as they become supercharged computing platforms for the next generation of app designers. New servers, switches, racks, and cabling will be needed at huge scale to create ultra-high speed, low latency networking fabrics capable of handling massive data sets.

These new servers optimized for AI will house multiple graphical processing units (GPUs) each with high-speed optical transceivers. We could see data rates of up to 1.6TB which is a massive increase over the 25/150/100G connections we see in datacentres optimized for cloud services. This need for speed is due to the collaborative nature of GPUs working on complex AI workloads such as deep learning and neural networks.

GPUs will be connected via transceivers to containerized lead and spine fabrics within pods each transmitting 4 or 8 optical signals packaged in new Quad Small Form Factor Pluggable Double Density (QSFP) and Octal Small Form Factor Pluggables (OSFP). These transceivers may operate with VCSEL-based lasers giving rise to multimode fibre over short distances up to 100m.

Short range multimode fibre at speeds above 100G opens up the need for parallel optical fibre cabling which is referred to as base-8. This uses higher fibre count cables and multi-fibre connectivity often reserved for connections at higher switching tiers. With massive volumes of networking hardware present within AI pods, operators will be faced with weeding out poor performing components such as faulty transceivers and connectors.

New switching capacity will also be necessary to interconnect GPUs in pods and pods to clusters containing 10,000s of GPUs. This is being met by the industry at a rapid rate as vendors have released switches with chipsets capable of moving 51.2 Tb/s of data across 512 lanes of traffic. This is a key enabler for AI systems as operators will be able to build with different configurations such as 32x1.6Tb, 64x 800Gb, 128x 400G or 256x 200Gb through physical ports or virtual breakouts.

Rack design is also getting a makeover, and a pretty major one at that! AI needs power, lots of power and power creates heat, lots of heat; therefore we will see liquid flooding into data halls. Don’t worry though because this is not a catastrophic accident—it’s by design. Air-cooled systems may not be ideal in all situations so liquid fed through specialist pipes will help fight the battle against heat within racks directly into hardware.

Fibre connectivity and cabling will play a critical part in connecting pods to clusters and clusters to the wider datacentre fabric. Singlemode fibre is expected to continue as the medium of choice as it can be used over longer distances typically up to 500m. Due to challenges with space we may see the emergence of new fibre very small form factor connectors (VSFF) and cassette/panel design. These connectors may terminate anywhere between 2 and 24 fibres depending on optical transceiver selection and system architecture.

With the growing importance of “networking” within datacentres the pressure on component and system performance will intensify. Faulty optical transceivers, fibre reconnectors, and cabling will need to be identified during installation or located quickly as downtime is public enemy number one in datacentres! Multi-fibre certification will ensure loss limits are respected and TX/RX issues are eliminated. Fibre connector inspection should fight

contamination problems and transceiver/link testing will identify errors in bit streams and issues related to our new foe: latency.

Bring on 2024 and AI at scale!

Today datacentres, which are already a vital part of the communications ecosystem, have augmented importance given the advent of AI. AI can only thrive with massive sets of data to leverage, and datacentres will ultimately determine their success.

As companies jockey for position, adopting solid test practices will make the difference between the average and the elite. So, let’s buckle up for the excitement to come as AI makes its way to centre stage in 2024.

By Rosemary Thomas, Senior Technical Researcher, AI Labs, Version 1.
By Ian Wood, Senior SE Director at Commvault.
By Ram Chakravarti, chief technology officer, BMC Software.
By Darren Watkins, chief revenue officer at VIRTUS Data Centres.
By Steve Young, UK SVP and MD, Dell Technologies.
By Richard Chart, Chief Scientist and Co-Founder, ScienceLogic.