Microsoft’s action-focused small language model Mu

jeudi 24 juillet 2025, 11:00 , par InfoWorld

If you’re running on the bleeding edge of Windows, using the Windows Insider program to install developer builds, you may have spotted a change in how Windows Settings works. Its search function is now an AI application: a new, highly focused small language model called Mu.

Initially discussed at the launch of the latest Surface devices, Mu is a hardware-optimized small language model trained on the contents of the Settings app, providing deep links to specific functions in response to user questions. This is the first use for Mu, which fills an interesting gap between single-task machine learning models and the more open approach taken by large models. Microsoft has been experimenting with small language models for some time, working on its Phi family of SLMs.

One of the key features of Microsoft’s Copilot+ PC specification is support for a neural processing unit (NPU) for local inferencing. Most use the Qualcomm Hexagon NPU, though the specification also includes Intel’s and AMD’s accelerators in their latest processor chip sets. Mu is designed to run on those NPUs, building on the lessons learned with Microsoft’s first NPU-optimized model, Phi Silica.

Like Phi Silica, Mu is bundled in Windows 11 on Copilot+ devices enrolled in the Windows Insider Dev channel. However, unlike Phi Silica, Mu does not (yet) expose any public APIs. You can see if it’s installed on your device by checking Settings/System/AI Components. If it is available, you will see it as AI Settings.

What is Mu?

Microsoft describes Mu as a “micro-sized, task-specific language model.” It builds on the encoder-decoder model architecture where the input is encoded and passed to a decoder in a single pass as a fixed-length representation instead of the per-token processing approach used by most LLMs. It’s more efficient, allowing you to get good performance on limited hardware, with less latency and higher throughput.

This approach is necessary if you want to deliver answers quickly and if you want to build something that can respond in real time. Although Microsoft doesn’t explicitly describe Mu as a tool for real-time AI, it’s clear that this is a possible direction. It’s not hard to see a use for Mu beyond Settings in an AI-powered Task Manager that is able to respond to issues with applications and services, sending notifications and suggested actions to users.

This is the performance you want on edge hardware, like a PC where you can’t throw GPU resources at a problem and you’re working in an environment of constrained power and bandwidth. 32GB of RAM and a 40 trillion operations per second NPU don’t compare to the inferencing capabilities of Azure. The resulting model is about a tenth the size of Phi with similar performance characteristics.

Designing a model ready for NPUs

Key to Mu’s success as an edge inferencing engine is how its design has been tied to the physical architecture of the supported NPUs. That requires tuning it for the available resources and ensuring that the matrix computations used to implement the model’s neural net are aligned with the hardware and that the instructions are supported by both the runtime and the NPU hardware.

By using 32 layers to implement the encoder and 12 for the decoder, the aim is to maximise performance and efficiency. Other techniques used save memory, for example, using the same set of weights for inputs and output tokens, an approach that interestingly also improves consistency between the two stages of the model.

Microsoft also used new approaches to building and training transformer networks and tools like the Muon optimizer to get as much performance out of Mu as possible. We often think there hasn’t been much progress in the underlying neural network technologies used in transformer-based language models, but there’s been a lot of development to make them both smaller and more efficient, developments that have informed much of the work of building Mu.

Training an edge model

There’s some detail on how Mu was trained, with a multilayer approach that starts with the same “all you need is textbooks” method as Phi before switching to Phi-based distillation. By building on the work to build and train Phi, Mu is again more efficient. Further quantization, using post-training techniques to shift the model from floating point to integer operations, continues to make it more suitable for NPU-hosted inference.

Getting a model built is only part of the story, as it now needs to be embedded in the Settings agent. This requires additional fine-tuning, using data from real-world searches in Settings and the expected outputs across hundreds of different paths within Settings. Microsoft used automated labelling techniques as well as prompt tuning to surface actions and links. Microsoft is clearly still tuning the model, as users on the Dev channel are asked to rate results.

Most of the time they’re accurate, but in some cases, they don’t quite work as expected. For example, asking to change mouse pointer colors to a specific color always results in an offer to change it to black, and changing pointer size will reset any other customizations. Still, it works, it’s fast, and in most cases, it’s more helpful than the search tool on a PC that doesn’t have an NPU.

Mu beyond Windows?

At this point, there’s no developer access to Mu in Windows nor is there a public release of the model on GitHub or Hugging Face, unlike most versions of Phi. That’s likely because, like Phi Silica, the model has been designed to work with specific hardware accelerators. While a general release of Mu would be nice, it’s currently tuned for specific NPUs, tying its architecture to Intel, AMD, and Qualcomm NPU designs.

The very focused training in this first release also makes even access via the Windows App SDK unlikely. However, that doesn’t mean it won’t get a public release in the future, especially as having a natural language guide to applications that offer a lot of customization turns out to be really rather useful.

It’s easy to imagine a tool like this being added to complex, powerful applications such as Adobe’s Photoshop or Lightroom where there are a lot of different settings scattered throughout the application and where many of them are context-dependent. By making Mu available to other organizations with a comprehensive guide to fine-tuning, Microsoft would be making it another platform feature where different use cases could give interesting results.

Delivering a new build of Mu with each new update to an application might seem like overkill, but you’re not having to completely retrain the model as most of its functionality is added during fine-tuning. That’s not to say tuning a model like this is simple; Microsoft trained it on more than 3.5 million samples to provide responses for hundreds of settings options. The fine-tuning data also included a mix of real user searches as well as synthetic queries to give it a mapping as wide as possible between what users type and what they want.

Perhaps most interesting about the training process was that the team working on it had to be aware of the nuance between closely related terms, as well as where Settings functionality is duplicated or, in many cases, spread across multiple screens. Getting the right tuning required taking the time to choose just which services are available in the initial release of the Mu-powered Settings agent.

Is Mu the new Clippy? No, that’s not the role of tools like this. It’s a smart search engine that can handle natural language queries and deliver a list of deep links to functionality. It’s not watching what you do, ready to step in to help.

Nothing prevents someone from using the Windows AI platform’s screen capture and OCR combination to get that context in, say, Office. With the resulting data being used to generate queries and intent, and with an appropriately trained and tuned version of Mu constructing an appropriate output, then maybe we do have the scaffolding to build a Clippy 2.0 that’s useful and actually helpful.

Lire la suite sur InfoWorld