When LLMs become influencers

lundi 10 février 2025, 10:00 , par InfoWorld

On Sunday people tuned in to watch the Super Bowl—and lots and lots of ads. Online, however, it’s increasingly the case that AI agents will see and interact with ads. Humans won’t. That’s the insight Grab’s regional marketing director Ken Mandel offers: “If an AI chooses your Coke order, not a human, then why bid for impressions on Instagram?” Summing up, he argues, “In a world where agents (not humans) interact with brands, the game shifts from advertising to AI influence and optimization.”

This promises to shake up the advertising industry, but something very similar is happening with developer tools. In developer relations, for example, it used to be critical for software companies to have developer advocates on Stack Overflow or Reddit, answering questions, writing tutorials, etc. But if developers increasingly turn to coding assistants like GitHub Copilot to get answers and guidance, then it becomes essential to influence how large language models (LLMs) “think” about one’s product. Yet how to do that isn’t clear at all.

In short, we know that we need to get smarter about how we influence the machines, but how?

Who trains the trainers?

Our ability to influence LLMs is seriously circumscribed. Perhaps if you’re the owner of the LLM and associated tool, you can exert outsized influence on its output. For example, AWS should be able to train Amazon Q to answer questions, etc., related to AWS services. There’s an open question as to whether Q would be “biased” toward AWS services, but that’s almost a secondary concern. Maybe it steers a developer toward Amazon ElastiCache and away from Redis, simply by virtue of having more and better documentation and information to offer a developer. The primary concern is ensuring these tools have enough good training data so they don’t lead developers astray.

For example, in my role running developer relations for MongoDB, we’ve worked with AWS and others to train their LLMs with code samples, documentation, etc. What we haven’t done (and can’t do) is ensure that the LLMs generate correct responses. If a Stack Overflow Q&A has 10 bad examples and three good examples of how to shard in MongoDB, how can we be certain a developer asking GitHub Copilot or another tool for guidance gets informed by the three positive examples? The LLMs have trained on all sorts of good and bad data from the public Internet, so it’s a bit of a crapshoot as to whether a developer will get good advice from a given tool.

Microsoft’s Victor Dibia delves into this, suggesting, “As developers rely more on codegen models, we need to also consider how well does a codegen model assist with a specific library/framework/tool.” At MongoDB, we regularly evaluate how well the different LLMs address a range of topics so that we can gauge their relative efficacy and work with the different LLM vendors to try to improve performance. But it’s still an opaque exercise without clarity on how to ensure the different LLMs give developers correct guidance. There’s no shortage of advice on how to train LLMs, but it’s all for LLMs that you own. If you’re the development team behind Apache Iceberg, for example, how do you ensure that OpenAI is trained on the best possible data so that developers using Iceberg have a great experience? As of today, you can’t, which is a problem. There’s no way to ensure developers asking questions (or expecting code completion) from third-party LLMs will get good answers.

Holding models accountable

Well, one option is simply to publish benchmarks. The LLM vendors will ultimately have to improve their output or developers will turn to other tools that consistently yield better results. If you’re an open source project, commercial vendor, or someone else that increasingly relies on LLMs as knowledge intermediaries, you should regularly publish results that showcase those LLMs that do well and those that don’t. Benchmarking can help move the industry forward. By extension, if you’re a developer who increasingly relies on coding assistants like GitHub Copilot or Amazon Q, be vocal about your experiences, both positive and negative. There’s a massive race among LLM providers to earn your trust. Name and shame the tools that hurt your ability to build great software.

Of course, it’s also possible that we’ll start to see projects and companies pay to influence code advice: SEO for coding assistants, as I’ve highlighted before. I’ve not seen that happening yet, but the trend Mandel sees in advertising could very easily find its way to software development. He says, “Imagine, Google Ads morphs into ‘Google AI Preference Bidding’—[and] brands bid to be the preferred suggestion when AI agents shop for users.” Could we see something similar happening with AI assistants? Sure. But I’m hopeful that benchmarking and vocal feedback will be a more effective way to ensure developers get the best possible guidance.

Lire la suite sur InfoWorld