AWS brings RAG evaluation and LLM-as-a-judge feature to Amazon Bedrock

lundi 2 décembre 2024, 13:57 , par InfoWorld

Amazon Web Services (AWS) has updated Amazon Bedrock with features designed to help enterprises streamline the testing of applications before deployment.

Announced during the ongoing annual re:Invent conference, the new features include a retrieval augmented generation (RAG) evaluation tool within Bedrock Knowledge Bases.

[ Related: AWS re:Invent 2024 news and insights ]

Bedrock Knowledge Bases are typically used by enterprises to use their own data to drive more context for large language models (LLMs) for better model responses. Enterprises can also use them to implement the entire RAG workflow of an application from ingestion to retrieval and prompt augmentation without having to build custom integrations to data sources and manage data flows.

“You can now run an automatic knowledge base evaluation to assess and optimize RAG applications using Amazon Bedrock Knowledge Bases,” the company wrote in a blog post, adding that the evaluation process uses an LLM to compute the metrics for the evaluation.

AWS said enterprises will be able to use RAG evaluations to compare different configurations and tune their settings to get the results they need for their use case.

In order to do these evaluations, enterprise users can use the Amazon Bedrock console to choose Evaluations after navigating to the Inference and Assessment section within the console, the company said.

This capability is currently in preview.

Bedrock Knowledge Base updates

AWS also announced support for custom connectors and the ability to rerank models within Bedrock Knowledge Bases.

The custom connectors allow the ingestion of data from a variety of sources, including the ingestion of streaming data in Amazon Bedrock Knowledge Bases.

“Developers can now efficiently and cost-effectively ingest, update, or delete data directly using a single API call, without the need to perform a full sync with the data source periodically or after every change,” the company said.

Without the custom connectors support, enterprises would have to move their data to an AWS-supported source, such as Amazon S3.

The custom connectors, which have been made generally available, can be accessed via the Bedrock console and the AWS software development (SDK).

AWS has introduced a new Rerank API inside Bedrock Knowledge Bases. The tool is designed to offer developers a way to use reranking models to enhance the performance of their RAG-based applications. This is achieved by improving the relevance and accuracy of responses and reducing costs.

The reranking models accessed via the new API could help developers overcome limitations of semantic search, which is often used in RAG applications, AWS said.

These limitations could include the inability to prioritize the most suitable documents based on user preferences or query context especially when the user query is complex, ambiguous, or involves nuanced context.

“This can lead to retrieving documents that are only partially relevant to the user’s question,” the company explained, adding that partially relevant document retrieval could lead to another challenge around proper attribution of sources.

Currently, the API supports Amazon Rerank 1.0 and Cohere Rerank 3.5 models.

LLM-as-a-judge inside Bedrock Model Evaluation

AWS has also added a new LLM-as-a-judge feature inside Bedrock Model Evaluation — a tool inside Bedrock that can help enterprises choose an LLM that fits their use case.

The new feature, which is currently in preview, according to the company, will allow developers to perform tests and evaluate other models with human-like quality at a lower cost compared to a human running these evaluations.

LLM-as-a-judge makes it easier for enterprises to go into production by providing fast, automated evaluation of AI-powered applications, shortening feedback loops, and speeding up improvements, AWS said. The evaluations assess multiple quality dimensions including correctness, helpfulness, and responsible AI criteria such as answer refusal and harmfulness.

Lire la suite sur InfoWorld

https://www.infoworld.com/article/3615426/aws-brings-rag-evaluation-and-llm-as-a-judge-feature-to-am...

56 sources (32 en français)

Date Actuelle

mer. 24 déc. - 01:24 CET