What if generative AI can’t get it right?

lundi 17 février 2025, 10:00 , par InfoWorld

Large language models (LLMs) keep getting faster and more capable. That doesn’t mean they’re correct. This is arguably the biggest shortcoming of generative AI: It can be incredibly fast while simultaneously being incredibly wrong. This may not be an issue in areas like marketing or software development, where tests and reviews can find and fix errors. However, as analyst Benedict Evans points out, “There is also a broad class of task that we would like to be able to automate, that’s boring and time-consuming and can’t be done by traditional software, where the quality of the result is not a percentage, but a binary.” In other words, he says, “For some tasks, the answer is not better or worse: It’s right or not right.”

Until generative AI can give us facts and not probabilities, it’s simply not going to be good enough for a wide swath of use cases, no matter how much the next DeepSeek speeds up its calculations.

Fact-checking AI

In January DeepSeek seemingly changed everything in AI. Mind-blowing speed at dramatically lower costs. As Lucas Mearian writes, DeepSeek sent “shock waves” through the AI community, but its impact likely won’t last. Soon there will be something faster and cheaper. But will there be something that provides what we most need? That is, more accuracy and truth? We can’t solve that problem by making AI more open. It’s deeper than that.

“Every week there’s a better AI model that gives better answers,” Evans notes. “But a lot of questions don’t have better answers, only right answers, and these models can’t do that.” This isn’t to say performance and cost improvements aren’t needed. DeepSeek, for example, makes genAI models more affordable for enterprises that want to build them into applications. And, as investor Martin Casado and former Microsoft executive Steven Sinofsky suggest, the application layer, not infrastructure, is the most interesting and important area for genAI development.

The problem, however, is that many applications depend on right-or-wrong answers, not “probabilistic … outputs based on patterns they have observed in the training data,” as I’ve covered before. As Evans expresses it, “There are some tasks where a better model produces better, more accurate results, but other tasks where there’s no such thing as a better result and no such thing as more accurate, only right or wrong.”

In the absence of the ability to speak truth rather than probabilities, the models may be worse than useless for many tasks. The problem is that these models can be exceptionally confident and wrong at the same time. It’s worth quoting an Evans example at length. In trying to find the number of elevator operators in the United States in 1980 (a number clearly identified in a U.S. Census report), he gets a range of answers:

First, I try [the question] cold, and I get an answer that’s specific, unsourced, and wrong. Then I try helping it with the primary source, and I get a different wrong answer with a list of sources, that are indeed the U.S. Census, and the first link goes to the correct PDF… but the number is still wrong. Hmm. Let’s try giving it the actual PDF? Nope. Explaining exactly where in the PDF to look? Nope. Asking it to browse the web? Nope, nope, nope…. I don’t need an answer that’s perhaps more likely to be right, especially if I can’t tell. I need an answer that is right.

Just wrong enough

But what about questions that don’t require a single right answer? For the particular purpose Evans was trying to use genAI, the system will always be just enough wrong to never give the right answer. Maybe, just maybe, better models will fix this over time and become consistently correct in their output. Maybe.

The more interesting question Evans poses is whether there are “places where [generative AI’s] error rate is a feature, not a bug.” It’s hard to think of how being wrong could be an asset, but as an industry (and as humans) we tend to be really bad at predicting the future. Today we’re trying to retrofit genAI’s non-deterministic approach to deterministic systems, and we’re getting hallucinating machines in response.

This doesn’t seem to be yet another case of Silicon Valley’s overindulgence in wishful thinking about technology (blockchain, for example). There’s something real in generative AI. But to get there, we may need to figure out new ways to program, accepting probability rather than certainty as a desirable outcome.

Lire la suite sur InfoWorld

https://www.infoworld.com/article/3825495/what-if-genai-cant-get-it-right.html

56 sources (32 en français)

Date Actuelle

sam. 23 août - 01:23 CEST