What’s next for Azure infrastructure

jeudi 18 décembre 2025, 10:00 , par InfoWorld

As 2025 comes to an end, it seems fitting to look at how Microsoft’s Azure hyperscale cloud is planning to address the second half of the decade. As has become traditional, Azure CTO Mark Russinovich gave his usual look at that future in his presentations at Ignite, this time split into two separate talks on infrastructure and software.

The first presentation looked at how the underlying infrastructure of Azure is developing and how the software you use is adapting to use the new hardware. Understanding what lies underneath the virtual infrastructure we use every day is fascinating, as it’s always changing where we can’t see. We don’t worry about the hardware under our software, as all we have access to are APIs and virtual machines.

That abstraction is both a strength and weakness of the hyperscale cloud. Microsoft continually upgrades all aspects of its hardware without affecting our code but we are forced to either wait for the cloud platform to make those innovations visible to everyone, or to move code to any of a handful of regions that have new hardware first, increasing the risks that come from reduced redundancy options.

Still, it’s worth understanding what Microsoft is doing, as the technologies it’s implementing will affect you and your virtual infrastructure.

Cooling CPUs with microfluidics

Russinovich’s first presentation took a layered approach to Azure, starting with how its data centers are evolving. Certainly, the scale of the platform is impressive: It now has more than 70 regions and over 400 data centers. They’re linked by more than 600,000 kilometers of fiber, including links across the oceans and around the continents, with major population centers all part of the same network.

As workloads evolve, so do data centers, requiring rethinking how Azure cools its hardware. Power and cooling demands, especially with AI workloads, are forcing redesigns of servers, bringing cooling right onto the chip using microfluidics. This is the next step in liquid cooling, where current designs put cold plates on top of a chip. Microfluidics goes several steps further, requiring a redesign of the chip packaging to bring cooling directly to the silicon die. By putting cooling right where the processing happens, it’s possible to increase the density of the hardware, stacking cooling layers between memory, processing, and accelerators, all in the same packaging.

The channels are designed using machine learning and are optimized for the hotspots generated by common workloads. Microsoft is doing the first generation of microfluidics etchings itself but plans to work with silicon vendors like Intel and AMD to pre-etch chips before they’re delivered. Microfluidic-based cooling isn’t only for CPUs; it can even be used on GPUs.

Boosting Azure Boost

Beyond silicon, Microsoft is enhancing Azure’s Open Hardware-based servers with a new iteration of its Azure Boost accelerators. Now fitted to more than 25% of Microsoft’s server estate and standard with all new hardware, Azure Boost is designed to offload Azure’s own workloads onto dedicated hardware so that user tenants and platform applications get access to as much server performance as possible. Code-named Overlake, the latest batch of Azure Boost accelerators adds 400Gbps of networking, giving 20Gbps of remote storage and 36Gbps of direct-attached NVMe storage at 6.6 million IOPS.

Under the hood is a custom system on a chip (SoC) that mixes Arm cores and a field-programmable gate array (FPGA) running the same Azure Linux as your Kubernetes containers. There’s added hardware encryption in Azure Boost to ensure compatibility with Azure’s confidential computing capabilities, keeping data encrypted across the boundary between servers and the Azure Boost boards.

Azure goes bare metal

One advantage of moving much of the server management to physical hardware is that Microsoft can now offer bare-metal hosts to its customers. This approach was originally used for OpenAI’s training servers, giving direct access to networking hardware and remote direct memory access to virtual machines. This last feature not only speeds up inter-VM communications, it also improves access to GPUs, allowing large amounts of data to move more efficiently. Azure’s RDMA service doesn’t just support in-cabinet or even in-data-center operations; it now offers low-latency connectivity within Azure regions.

Bare-metal servers give applications a significant performance boost but really only matter for big customers who are using them with regional RDMA to build their own supercomputers. Even so, the rest of us get better performance for our virtual infrastructures. That requires removing the overhead associated with both virtual machines and containers. As Russinovich has noted in earlier sessions, the future of Azure is serverless: hosting and running containers in platform-as-a-service environments.

That serverless future needs a new form of virtualization, which goes beyond Azure’s secure container model of nested virtual machines, giving access to hardware while keeping the same level of security and isolation. Until now that’s been impossible, as nested virtualization required running hypervisors inside hypervisors to enforce necessary security boundaries and preventing malicious code from attacking other containers on the same hardware.

A new direct virtualization technique removes that extra layer, running user and container VMs on the server hypervisor, still managed by the same Azure Host OS. This approach gets rid of the performance overheads that come from nested hypervisors and gives the virtualized clients access to server hardware like GPUs and AI inference accelerators. This update gives you the added benefit of faster migration between servers in case of hardware issues.

This approach is key to many of Microsoft’s serverless initiatives, like Azure Container Instances (ACI), to give managed containers access to faster networking, GPUs, and the like. This should improve performance. Russinovich demonstrated a 50% improvement for PostgreSQL along with a significant reduction in latency. By giving containers access to GPUs, ACI gains the ability to host AI inferencing workloads so you can bring your open source models to containers. This should allow you to target ACI containers from AI Foundry more effectively.

Custom hardware for virtual networks

AI has had a considerable influence on the design of Azure data centers, especially with big customers needing access to key infrastructure features and, where possible, the best possible performance. This extends to networking, which has been managed by specialized virtual machines to handle services like routing, security, and load balancing.

Microsoft is now rolling out new offload hardware to host those virtual network appliances, in conjunction with top-of-the-rack smart switches. This new hardware runs your software-defined network policies, managing your virtual networks for both standard Azure workloads and for your own specific connectivity, linking cloud to on-premises networks. The same hardware can transparently mirror traffic to security hardware without affecting operations, allowing you to watch traffic between specific VMs and look for network intrusions and other possible security breaches without adding latency that might warn attackers.

Speeding and scaling its storage

The enormous volume of training data used by AI workloads has made Microsoft rethink how it provisions storage for Azure. Video models require hundreds of petabytes of image data, at terabytes of bandwidth and many thousands of IOPS. That’s a significant demand for already busy storage hardware. This has led to Microsoft developing a new scaled storage account, which is best thought of as a virtual account on top of the number of standard storage accounts needed to deliver the required amount of storage.

There’s no need to change the hardware, and the new virtual storage can encompass as many storage accounts as you need to scale as large as possible. As the storage is shared, you can get very good performance as data is retrieved from each storage account in parallel. Russinovich’s Ignite demo showed it working with 1.5 petabytes of data in 480 nodes, with writes running at 22 terabits per second and reads from 695 nodes at 50 terabits per second.

While a lot of these advances are specialized and focused on the needs of AI training, it’s perhaps best to think of those huge projects as the F1 teams of the IT world, driving innovations that will impact the rest of us, maybe not tomorrow, but certainly in the next five years. Microsoft’s big bet on a serverless Azure needs a lot of these technologies to give its managed containers the performance they need by refactoring the way we deliver virtual infrastructures and build the next generation of data centers. Those big AI-forward investments need to support all kinds of applications as well, from event-driven Internet of Things to distributed, scalable Kubernetes, as well as being ready for platforms and services we haven’t yet begun to design.

Features like direct virtualization and networking offload look like they’re going to be the quickest wins for the widest pool of Azure customers. Faster, more portable VMs and containers will help make applications more scalable and more resilient. Offloading software-defined networking to dedicated servers can offer new ways to secure our virtual infrastructures and protect our valuable data.

What’s perhaps most interesting about Russinovich’s infrastructure presentation is that these aren’t technologies that are still in research labs. They’re being installed in new data centers today and are part of planned upgrades to the existing Azure platform. With that in mind, it’ll be interesting to see what new developments Microsoft will unveil next year.

Lire la suite sur InfoWorld