How to write nonfunctional requirements for AI agents

mardi 7 octobre 2025, 11:00 , par InfoWorld

The classic approach to writing agile user stories starts with documenting who the end user is, the user’s objective, and why it’s important to them. You’ll often see user stories follow the format, “As a user type, I want to be able to complete a task so that I can achieve a specific outcome.” The product owner helps the development team understand when the user story is done by providing a list of pass-fail acceptance criteria.

Product owners typically focus on functional acceptance criteria, which help define the user experience, business rules, and automation behaviors. Technical leads, architects, security specialists, and devops engineers should add nonfunctional requirements (NFRs) that focus on the system’s performance, operations, and compliance requirements.

Agile user stories for AI agents

Should agile user stories for AI agents be any different from those written for applications and APIs? AI agents are multifaceted, including application, automation, data, API, and AI components, so there’s undoubtedly a need to express their nonfunctional requirements. User stories for AI agents should have the following types of acceptance criteria:

Functional requirements that focus on what the agent will do and where humans-in-the-middle will provide oversight.

A set of nonfunctional requirements focusing on areas of performance, compliance, security, observability, and other operational requirements, just as they would for APIs and automations.

Another set of nonfunctional requirements focusing on data, including data quality, governance, bias, and AI model maintenance.

Nonfunctional requirements for AI agents can be like those for applications, where user stories are granular and target delivering small, atomic functions. These NFRs can guide developers in answering how to develop the functionality described in user stories and to help quantify what should pass a code review.

However, you may need another set of NFRs expressed at a feature or release level. These NFRs help qualify an AI agent’s release readiness, specify data and AI governance requirements, and define other devops non-negotiables.

“For teams working with agentic AI, it’s essential to differentiate which nonfunctional requirements are best enforced by machines, like security, compliance, and scalability, and which still demand human judgment, such as UX, aesthetics, and performance that feels fast,” says Jonathan Zaleski, director of technical architecture at HappyFunCorp. “The future of AI product development lies in hybrid workflows, where AI handles objective, measurable criteria at scale, and humans focus on the emergent, intuitive aspects that shape truly meaningful experiences.”

What should nonfunctional requirements for AI agents look like in practice? We’ll consider several key areas to focus on, along with examples of NFRs used in each area.

Ethics and fairness

At the front end of an AI agent is a large language model (LLM) that will interpret requests in natural language, perform actions, and respond with recommendations. Development teams should consider nonfunctional acceptance criteria to validate whether the agent behaves in a responsible and unbiased manner. But this isn’t trivial, as writing these NFRs requires pass/fail expressions in areas that can be challenging to express.

“Agile teams often struggle with how to evaluate NFRs like latency, fairness, or explainability, which may seem nonfunctional, but with a little specification work, they can often be made concrete and part of a user story with clear pass/fail tests,” says Grant Passmore, co-founder of Imandra. “We use formal verification to turn NFRs into mathematical functional requirements we can prove or disprove.”

Example ethics and fairness NFRs often require creating test scenarios, scaling with synthetic test data sets, and evaluating the AI agent’s response. NFRs can then define:

Explainability, which involves taking a poll of subject matter experts. A nonfunctional requirement could be, “The explanation behind responses and recommended actions should meet the explainability expectations of 80% of the subject matter expert group.”

Data bias, which requires educating the development team about the different types of data biases and using bias detection tools with an acceptance metric.

Harmful response, which turns abusive or deceptive responses into a functional metric by applying tools to analyze the AI agent’s responses and recommended actions.

Accuracy and quality

Passing ethics and fairness tests is just the beginning. The next set of NFRs should target their usefulness, accuracy of actions, and quality of responses. NFRs should relate to the type of work the AI agent performs and may include:

F1 score to measure the model’s accuracy, which includes measures of precision and recall. An NFR might require a minimal F1 score of 0.85.

Hallucination rate for capturing when an AI agent responds with factual errors or other accuracy issues.

User satisfaction scores, where the agent’s user interface for a human-in-the-middle captures positive and negative feedback.

Adversarial testing, which usually involves setting up data sets and automating tests that try to break an AI agent.

“Every AI feature must specify what acceptable performance looks like, whether it’s 90% precision for classification or relevant output from an LLM,” says Josh Mason, CTO of RecordPoint. “ML models might be evaluated with accuracy or F1 score, while LLMs may need to show that 85% of responses include reliable citations or pass grounding validation.”

Security, privacy, compliance, and legal

NFRs around compliance and security often include a mix of technology capabilities and requirements at the user story, feature, and release levels.

As AI is not fully deterministic, using technology solutions embedded with the AI agent and its runtime environment offers ongoing protection to meet the compliance requirements.

Josh Mason of RecordPoint says, “AI systems must prevent abuse and protect sensitive data,” and shares the following tips for developing data security NFRs

Prompt injection is the new SQL injection, which requires runtime technology to prevent intrusions.

Machine learning models require anonymized, encrypted data, which can be feature-level NFRs, before using new data sets.

LLMs need input sanitization, PII redaction, and other guardrails to prevent manipulation through adversarial prompts.

Performance and scalability

Many nonfunctional requirements use measurements to ensure AI agent performance and scalability similar to NFRs for applications. Some examples are:

Response time: The AI agent must respond to a user or another AI agent’s input within 1 second in 98% of cases.

Throughput: The system should support 100 concurrent agent instances.

Scalability: The system should scale horizontally to handle 10 times spikes in utilization with under 1% performance degradation.

“Teams building AI experiences need to evaluate both what the model does and how it performs,” says Andrew Filev, CEO and founder of Zencoder. “Functional benchmarks check for usefulness and accuracy, but they don’t measure the speed or smoothness of the experience. For that, you need classic nonfunctional latency metrics like time to first token, time to last token, and overall end-to-end agent execution latency when using agentic AI.”

Maintainability and observability

AI agent NFRs that connect dev with ops have all the complexities of applications, infrastructure, automations, and AI models bundled together. Deploying the AI agent is just the beginning of its lifecycle, and NFRs for maintainability and observability help create the feedback loops required to diagnose issues and make operational improvements.

As many organizations aim toward autonomous agentic AI and agent-to-agent workflows, standardizing a list of NFRs that are applied across all AI agents becomes important. Some standards may include:

Observability standards, so that all AI agents log consistent information in a centralized location.

Canary releases, so that new AI model versions can be tested with a segmented user base and have their results benchmarked with the last stable release.

Modelops, so that model drift is automatically detected and used to alert development teams when retraining may be necessary.

There’s a lot of excitement in building and deploying AI agents. Organizations are developing AI agents for productivity improvements, mobile capabilities, and customer experiences. As more businesses deploy AI agents, organizations will need to develop agentic AI architecture rules and consider how to govern agentic ecosystems.

Getting an agent to deliver business value requires defining operational and other nonfunctional requirements. Organizations committed to developing AI agents should create standards and learn from the past when apps were built without consistent operational considerations.

Lire la suite sur InfoWorld