How pairing SAST with AI dramatically reduces false positives in code security

jeudi 20 novembre 2025, 19:15 , par InfoWorld

The promise of static application security testing (SAST) has always been the “shift-left” dream, catching vulnerabilities before they ever hit production. But for too long, that promise has been undermined by a frustrating reality with an overwhelming volume of alerts and high false-positive rates. This noise can lead to alert fatigue, wasted developer time and a loss of trust in the very tools designed to protect our codebase.

Meanwhile, large language models (LLMs) have emerged as powerful code analysis tools, capable of pattern recognition and code generation. Yet, they suffer from their own weaknesses, slow processing, inconsistency, and the potential for hallucination.

In our opinion, the path to next-generation code security is not choosing one over the other, but integrating their strengths. So, along with Kiarash Ahi, founder, Virelya Intelligence Research Labs and the co-author of the framework, I decided to do exactly that. Our novel hybrid framework combines the deterministic rigor and the speed of traditional SAST with the contextual reasoning of a fine-tuned LLM to deliver a system that doesn’t just find vulnerabilities, but also validates them. The results we achieved were stark: A 91% reduction in false positives compared to standalone SAST tools, transforming security from a reactive burden into an integrated and more efficient process.

The core problem: Context vs. rules

Traditional SAST tools, as we know, are rule-bound; they inspect code, bytecode, or binaries for patterns that match known security flaws. While effective, they often fail when it comes to contextual understanding, missing vulnerabilities in complex logical flaws, multi-file dependencies, or hard-to-track code paths. This gap is why their precision rates and the percentage of true vulnerabilities among all reported findings remain low. In our empirical study, the widely used SAST tool, Semgrep, reported a precision of just 35.7%.

Our LLM-SAST mashup is designed to bridge this gap. LLMs, pre-trained on massive code datasets, possess pattern recognition capabilities for code behavior and a knowledge of dependencies that deterministic rules lack. This allows them to reason about the code’s behavior in the context of the surrounding code, relevant files, and the entire code base.

A two-stage pipeline for intelligent triage

Our framework operates as a two-stage pipeline, leveraging a SAST core (in our case, Semgrep) to identify potential risks and then feeding that information into an LLM-powered layer for intelligent analysis and validation.

Stage 1, initial SAST findings: The Semgrep SAST engine runs and identifies all potential security risks. For each flag, it extracts the intermediate representations, such as the data flow path from source to sink.

Stage 2, LLM-powered intelligent triage: This is the critical step for filtering noise. The framework embeds the relevant code snippet, the data flow path and surrounding contextual information into a structured JSON prompt for a fine-tuned LLM. We fine-tuned Llama 3 8B on a high-quality dataset of vetted false positives and true vulnerabilities, specifically covering major flaw categories like those in the OWASP Top 10 to form the core of the intelligent triage layer. Based on the relevant security issue flagged, the prompt then asks a clear, focused question, such as, “Does this user input lead to an exploitable SQL injection?”

By analyzing the context that traditional SAST rules miss, the LLM can reliably determine if a finding is truly exploitable, acting as an intelligent triage layer. This is the key mechanism that allows the framework to convert a mountain of alerts into a handful of verified, actionable findings.

The metrics, from noise to actionable intelligence

The following empirical results validate our hybrid approach. Our test dataset had 25 diverse open source projects based on their active development and language diversity (Python, Java, JavaScript), with 170 vulnerabilities as ground truth, sourced from public exploit databases and manual expert verification.

Precision: In our implementation, we found the precision jumped to 89.5%. This is a massive leap not only over Semgrep’s baseline of 35.7%, but also over a purely LLM-based approach (GPT-4), which achieved 65.5%.

False positive reduction: Semgrep generated a total of 225 false positives. Our framework filtered this down to just 20, representing an approximately 11x improvement in the signal-to-noise ratio.

Time to triage: This reduction in noise translated directly to developer efficiency, reducing the average triage time for security analysts by a stunning 91%.

Furthermore, the contextual reasoning of the LLM layer enabled the discovery of complex vulnerability types that traditional scanners miss, such as multi-file dataflow bugs.

Beyond detection, validation, and remediation

The LLM’s role doesn’t stop at filtering. It transforms the final output into actionable intelligence.

Automated exploit generation: For vulnerabilities confirmed as exploitable, our framework automatically generates a proof-of-concept (PoC) exploit. This capability is crucial for verifying existence and providing concrete evidence to developers. In our evaluation, our framework successfully generated valid PoCs for approximately 70% of exploitable findings, significantly reducing the manual verification burden on security analysts.

Dynamic remediation suggestion: LLMs, with their ability to understand code and generate text, produce comprehensive, human-readable bug descriptions and concrete repair suggestions. This streams raw security findings directly into the developer workflow, accelerating the time to fix and minimizing the window of vulnerability.

A SAST and LLM synergy marks a necessary evolution in static code security. By integrating deterministic analysis with intelligent, context-aware reasoning, we can finally move past the false positive crisis and equip developers with a tool that provides high signal security feedback at the pace of modern development with LLMs.

This article is published as part of the Foundry Expert Contributor Network.Want to join?

Lire la suite sur InfoWorld