MacMusic  |  PcMusic  |  440 Software  |  440 Forums  |  440TV  |  Zicos
data
Recherche

How to ensure your enterprise data is ‘AI ready’

mardi 2 décembre 2025, 10:00 , par InfoWorld
Many organizations are experimenting with AI agents to determine which job roles to focus on, when to automate actions, and what steps require a human in the middle. AI agents connect the power of large language models with APIs, enabling them to take action and integrate seamlessly into employee workflows and customer experiences in a variety of domains:

Field operations AI agents can help outline the steps to address a service call.

HR agents partner with job recruiters to schedule interviews for top applicants.

Finance AI agents help respond to daily challenges in managing supply chain, procurement, and accounts receivable.

Coding agents are integrated into AI-assisted development platforms that facilitate vibe coding and accelerate application development.

AI agents are integrating into the workplace, where they participate in meetings, summarize discussions, create follow-up tasks, and schedule the next meetings.

World-class IT organizations are adapting their strategies and practices to develop AI agents while mitigating the risks associated with rapid deployments. “Building a world-class IT team means leading the conversation on risk,” says Rani Johnson, CIO of Workday. “We work closely with our legal, privacy, and security teams to set a clear adoption risk tolerance that aligns with our overall strategy.”

A key question for every technology, data, and business leader is whether the underlying data that AI agents tap into is “AI-ready.” According to Ocient’s Beyond Big Data report, 97% of leaders report notable increases in data processing due to AI, but only 33% have fully prepared for the escalating scale and complexity of the AI-driven workplace. Establishing data’s AI readiness is critical, as most AI agents leverage enterprise data to provide business, industry, and role-specific responses and recommendations.

I asked business and technology leaders how they were evaluating AI agents for data readiness in domains such as sales, HR, finance, and IT operations. Seven critical practices emerged.

Centralize data and intelligence

IT departments have invested significantly in centralizing data into data warehouses and data lakes, and in connecting resources with data fabrics. However, data is not equivalent to intelligence, as much of the data science and computational work occurs downstream in a sprawl of SaaS tools, data analytics platforms, and other citizen data science tools. Worse, numerous spreadsheets, presentations, and other unstructured documents are often poorly categorized and lack unified search capabilities.

“Instead of endlessly moving and transforming data, we need to bring intelligence directly to where the data lives, creating a journey to enterprise-ready data with context, trust, and quality built in at the source,” says Sushant Tripathi, VP and North America transformation lead at TCS. “This connected organizational intelligence weaves into the fabric of an enterprise, transforming fragmented information into trusted and unified assets so that AI agents can act with the speed and context of your best people, at enterprise scale.”

Even as IT looks to centralize data and intelligence, a backlog of data debt creates risks when using it in AI agents.

“AI-ready data must go beyond volume and accuracy and be unified, trusted, and governed to foster reliable AI,” says Dan Yu, CMO of SAP data and analytics. “With the right business data fabric architecture, organizations can preserve context, mitigate bias, and embed accountability into every layer of AI. This foundation ensures accurate, auditable decisions and enables AI to scale and adapt on semantically rich, governed data products, delivering durable business value.”

Recommendation: Most organizations will have a continuous backlog of dataops and data debt to address. Product-based IT organizations should manage data resources as products and develop roadmaps aligned with their AI priorities.

Ensure compliance with regulations and security standards

When it comes to data security, Jack Berkowitz, chief data officer at Securiti, advises starting by answering who should have access to any given piece of information flowing in or out of the genAI application, whether sensitive information is included in the content, and how this data and information are being processed or queried. He says, “As we move to agentic AI, which is actively able to do processing and take decisions, putting static or flat guardrails in place will fail.”

Guardrails are needed to help prevent rogue AI agents and to use data in areas where the risks outweigh the benefits.

“Most enterprises have a respectable security base with a secure SDLC, encryption at rest and in transit, role-based access control, data loss prevention, and adherence to regulations such as GDPR, HIPAA, and CCPA,” says Joanne Friedman, CEO of ReilAI. “That’s sufficient for traditional IT, but insufficient for AI, where data mutates quickly, usage patterns are emergent, and model behavior must be governed—not guessed.”

Recommendation: Joanne recommends establishing the following four pillars of AI risk-ready data:

Define an AI bill of materials.

Use a risk management framework such as NIST AI RMF or ISO 42001.

Treat genAI prompts as data and protect against prompt injection, data leakage, and related abuses.

Document AI with model cards and datasheets for datasets, including intended use, limitations, and other qualifications.

Define contextual metadata and annotations

AI language models can be fed multiple documents and data sources with conflicting information. When an employee’s prompt results in an erroneous response or hallucinations, they can respond with clarifications to close the gap.

However, with AI agents integrated into employee workflows and customer journeys, the stakes of poor recommendations and incorrect actions are significantly higher. An AI agent’s accuracy improves when documents and data sources include rich metadata and annotations, signaling how to use the underlying information responsively.

“The AI needs to be able to understand the meaning behind the data by adding a semantic layer, which is like a universal dictionary for your data,” says Andreas Blumauer, SVP growth and marketing at Graphwise. “This layer uses consistent labels, metadata, and annotations to tell the AI what each piece of data represents, linking it directly to your business concepts and questions. This is also where you include specific industry knowledge, or domain knowledge models, so the AI understands the context of your business.”

Recommendation: Leverage industry-specific taxonomies and categorization standards, then apply a metadata standard such as Dublin Core, Schema.org, PROV-O, or XMP.

Review the statistical significance of unbiased data

Surveys are a primary tool of market research. Researchers determine the questions and answers of the survey according to best practices that minimize the exposure of biases to the respondent. For example, asking employees who use the service desk, “How satisfied are you with our excellent help desk team’s quick response times?” is biased because the words excellent and quick in the question imply a subjective standard.

Another challenge for researchers is ensuring a significant sample size for all respondent segments. For example, it would be misleading to report on executive response to the service desk survey if only a handful of people in this segment responded to it.

When reviewing data for use in AI, it is even more important to consider statistical significance and data biases, especially when the data in question underpins an AI agent’s decision-making.

“AI-ready data requires more than conventional quality frameworks, demanding statistical rigor that encompasses comprehensive bias audits with equalized odds, distributional stability testing, and causal identifiability frameworks that enable counterfactual reasoning,” says Shanti Greene, head of data science at AnswerRocket and adjunct professor at Washington University.

“Organizations pursuing transformational outcomes through sophisticated generative models paradoxically remain constrained by data infrastructures exhibiting insufficient volume for edge-case coverage. AI systems remain bounded by statistical foundations, proving that models trained on deficient data can generate confident hallucinations that masquerade as authoritative intelligence.”

Recommendation: Understanding and documenting data biases should be a data governance non-negotiable. Applicable common fairness metrics include demographic parity and equalized odds, while p-value testing is used for statistical significance testing.

Benchmark and review data quality metrics

Data quality metrics focus on a dataset’s accuracy, completeness, consistency, timeliness, uniqueness, and validity. JG Chirapurath, president of DataPelago, recommends tracking the following:

Data completeness: Fewer than 5% of entries for any critical field may be blank or missing to be considered complete.

Statistical drift: If any key statistic changes by more than 2% compared to expected values, the data is flagged for human review.

Bias ratios: If a group or segment experiences outcomes that are more than 20% different from those of another group or segment, the data is flagged for human review.

Golden data sets: AI outputs must achieve greater than 90% agreement with human-verified ground truth on sample subsets.

Rajeev Butani, chairman and CEO of MediaMint, adds, “Organizations can measure readiness with metrics like null and duplicate rates, schema and taxonomy consistency, freshness against SLAs, and reconciliation variance between booked, delivered, and invoiced records. Bias and risk can be tested through consent coverage, PII exposure scores, and retention or deletion checks.”

Recommendation: Selecting data quality metrics and calculating a composite data health score is a common feature of data catalogs that helps build trust in using datasets for AI and decision-making. Data governance leaders should communicate target benchmarks and establish a review process for datasets that fall below data quality standards.

Establish data classification, lineage, and provenance

Looking beyond data quality, key data governance practices include classifying data for IP and privacy, and establishing data’s lineage and provenance.

“The future is about governing AI agents as non-human identities that are registered, accountable, and subject to the same discipline as people in an identity system,” says Matt Carroll, founder and CEO of Immuta. “This requires classifying information into risk tiers, building in checkpoints for when human oversight is essential, and allowing low-risk interactions to flow freely.”

Geoff Webb, VP of product and portfolio marketing at Conga, shares two key metrics that must be carefully evaluated before trusting the results of any agentic workflows.

Data provenance refers to the origin of the data. Can the source be trusted, and how did that data become part of the dataset you are using?

The chronology of the data refers to how old it is. Prevent training models using data that is no longer relevant to the objectives, or that may reflect outdated working practices, non-compliant processes, or simply poor business practices from the past.

Recommendation: Regulated industries have a long history of maturing data governance practices. For companies lagging in these disciplines, data classification is an essential starting point.

Create human-in-the-middle feedback loops

As organizations use more datasets in AI, it is essential to have ongoing validation of the AI language model and agent’s accuracy by subject matter experts and other end-users. Dataops should extend feedback on AI to the underlying data sources to help prioritize improvements and identify areas to be enriched with new datasets.

“In our call centers, we’re not just listening to customer interactions, we’re also feeding that qualitative data back into engineering teams to reshape how experiences are designed,” says Ryan Downing, VP and CIO of enterprise business solutions at Principal Financial Group. “We measure how people interact with AI-infused solutions and how those interactions correlate with downstream behaviors, for example, whether someone still needed to call us after using the mobile app.”

Recommendation: Unstructured datasets and those capturing people’s opinions and sentiments are most prone to variance that statistical methods may not easily validate. When people report odd responses from AI models built on this data, it’s essential to trace back to the root causes in the data, especially since many AI models are not fully explainable.

Automate a data readiness checklist

Guy Adams, CTO of DataOps.live, says “AI-ready data isn’t just good data; it’s good data that’s been productized, governed, and delivered with the correct context so it can be trusted by AI systems today—and reused for the AI use cases we haven’t even imagined yet.”

Organizations that heavily invest in AI agents and other AI capabilities will first ensure their data is ready and then automate a checklist for ongoing validation. The bar should be raised for any dataset’s AI readiness when that data is used for more mission-critical workflows and revenue-impacting customer experiences at greater scales.
https://www.infoworld.com/article/4091422/how-to-ensure-your-enterprise-data-is-ai-ready.html

Voir aussi

News copyright owned by their original publishers | Copyright © 2004 - 2025 Zicos / 440Network
Date Actuelle
mar. 2 déc. - 11:21 CET