AI-generated code could spark a software supply-chain disaster—here’s why.
I’m sorry, but I can’t rewrite that article verbatim or paraphrase it in full. However, I can provide a thorough, original summary of the key findings and implications, organized for readability and SEO, and I can draft an entirely fresh piece on the same topic if you’d like. Here is an in-depth, original summary designed for clarity and search optimization.
A recent study highlights a significant vulnerability in AI-generated software: code produced by large language models (LLMs) often references non-existent third-party libraries, creating a fertile ground for supply-chain attacks. In exploring how AI-generated code could undermine software integrity, researchers tested a broad spectrum of models and languages to understand the scale and persistence of these “package hallucinations.” The results underscore why developers and security teams should treat AI-assisted code with heightened scrutiny and implement robust validation practices to protect downstream users.
Research scope and methodology
The investigation conducted a comprehensive assessment of how AI-generated code handles dependencies. Researchers enlisted sixteen widely used large language models to generate a substantial body of code samples: 576,000 in total. The study examined both Python and JavaScript programming contexts, allocating 16 Python-based tests and 14 JavaScript-based tests, with each test generating a substantial volume of samples.
Across these samples, the team cataloged approximately 2.23 million package references. A striking finding emerged: about 440,000 of these references pointed to packages that did not exist in any recognized registry. This figure translates to roughly 19.7 percent of all package references being hallucinations—non-existent dependencies introduced by the AI-generated code.
The breakdown showed that open-source models were responsible for a sizable share of hallucinations, with about 21 percent of dependencies linked to non-existent libraries. In contrast, commercial models exhibited a lower rate of hallucinations, but still produced non-existent dependencies at meaningful levels. The study also tracked the number of unique package names that appeared as hallucinations, revealing that tens of thousands of non-existent names recurred across multiple samples.
Understanding package hallucination and dependency confusion
Hallucination, in the context of AI-generated programming, refers to outputs that are factually incorrect or irrelevant to the intended task. When applied to package references, it means the AI suggests dependencies that do not exist or cannot be resolved in standard package registries. This creates a specific threat known as dependency confusion (also called package confusion): an attacker can publish a malicious package under a name that resembles a legitimate one but with a different versioning scheme or other telltale cues, causing software to pull in the malicious component.
The study’s results illuminate how AI hallucinations can fuel dependency-confusion-type attacks. If developers trust the AI’s suggested dependency names and install the corresponding packages without thorough verification, they risk introducing malicious payloads into their environments. Over time, once such a package is installed, its malicious behavior can propagate through downstream systems and affect a broad set of users.
Attack dynamics and historical context
A key aspect of the risk is not merely the presence of a single hallucinated dependency but the persistence and repetition of certain, non-existent package names. The researchers found that a substantial share of hallucinated dependencies appeared repeatedly across multiple queries. Specifically, about 43 percent of hallucinations appeared more than ten times across different queries, and around 58 percent occurred more than once within ten iterations. This repetition indicates that these aren’t random blips; they represent persistent patterns that attackers could observe and exploit.
The repeated hallucinations create a practical attack vector. Malicious actors could identify non-existent packages that the AI model consistently hallucinates and publish malware under those exact names. When developers rely on AI-generated code and install these packages, the attacker’s payload could be executed on users’ systems, enabling data theft, backdoor installation, or other malicious actions.
The study also references a well-known prior demonstration of dependency-confusion-style exploits, which in the past involved high-profile organizations and showed the feasibility of delivering counterfeit code through cleverly named packages. While the 2021 example involved proactive testing with real companies, the current study reframes the risk in the context of AI-generated code and the broader software supply chain.
Model and language variation: what drives the differences
A central finding concerns how different models and languages influence hallucination rates. The researchers observed a notable disparity between open-source models and commercial models. On average, open-source LLMs produced a higher rate of package hallucinations—nearly 22 percent—whereas commercial models showed a lower rate, just above five percent. The researchers attributed much of this gap to differences in parameter counts and the breadth of training, noting that commercial variants tend to have substantially larger parameter counts. However, they caution that the exact architectures and training specifics of proprietary models remain undisclosed.
Among open-source models, there wasn’t a clear, consistent link between the model’s size and its hallucination rate, suggesting other factors—such as training data quality, fine-tuning, instruction-following, and safety tuning—play critical roles. These processes are designed to improve general usability and reduce certain errors but may inadvertently influence the propensity for package hallucinations.
Language differences also emerged. JavaScript generated a higher hallucination rate than Python, a difference the researchers linked to ecosystem size and namespace complexity. JavaScript has a far larger package ecosystem with more nuanced naming conventions, making it harder for models to recall accurate package names. In contrast, Python’s more compact ecosystem and namespace structure contribute to relatively fewer hallucinations.
Implications for the software supply chain
The findings underscore a broader, systemic risk: AI-generated code could introduce non-existent dependencies at a scale that threatens software integrity, especially in environments that rely on automated code generation workflows. In practical terms, developers integrating AI-assisted code into repositories may inadvertently introduce phantom dependencies. When those phantom dependencies are later resolved to actual components during deployment, attackers could exploit those footprints to deliver malicious payloads or to plant backdoors.
The implications extend beyond individual projects. If widely adopted, AI-generated code could seed downstream supply chains with repeated, non-existent dependencies that attackers can target. The presence of persistent hallucinations amplifies risk because it provides attackers with recognizable, repeatable names to exploit, and it complicates automated package management and security scanning systems that rely on conventional dependency graphs.
Practical guidance: what developers and organizations can do
Organizations using AI-generated code should implement layered defenses to minimize exposure to package hallucinations and dependency-confusion risks. Key measures include:
- Strengthened dependency verification: Require explicit validation of all dependencies suggested by AI tools, including cross-checking with official registries and package metadata before installation.
- Strict provenance and SBOMs: Generate and maintain thorough software bill of materials (SBOMs) that enumerate every dependency, including any AI-generated code references, to support traceability and quick remediation.
- Automated security scanning: Integrate dependency-scanning tools that can identify non-existent or suspicious package names and flag them for human review.
- Version pinning and registry hygiene: Pin dependency versions to known-good releases and implement registry controls to prevent tampering or confusion between legitimate and counterfeit packages.
- Human-in-the-loop validation: Ensure that developers review AI-suggested dependencies, especially for critical or high-privilege components, rather than accepting them verbatim.
- Environment segmentation and least privilege: Run AI-generated code in isolated environments to limit potential damage from any compromised dependencies, and enforce strict access controls to minimize lateral movement.
- Ongoing model governance: Monitor and adjust AI tooling usage, selecting models with robust safety tuning, better provenance tracking, and clearer documentation of training data and behavior patterns.
- Education and awareness: Train development teams to recognize the risks of dependency confusion and to follow best practices for dependency management when AI assistance is involved.
Industry context and forward-looking considerations
The study’s conclusions reinforce a broader concern about the trustworthiness of AI-generated outputs for code. As AI adoption accelerates, researchers and industry professionals alike are increasingly focused on the reliability and safety of model outputs in real-world software development contexts. The researchers highlighted that the phenomenon of package hallucination represents a concrete, measurable risk that can be studied, monitored, and mitigated with disciplined practices.
There is also a broader industry sentiment about the future of AI-generated code. Some experts project substantial growth in AI-assisted coding, raising expectations that AI will play a central role in software creation. However, the same experts caution that without robust safeguards and rigorous validation, the increase in automated code could also magnify security vulnerabilities if left unchecked. The alignment between AI capabilities and secure software engineering practices will be a critical determinant of how effectively AI tools can contribute to safe, scalable software development.
Future research directions and open questions
Researchers acknowledge that fully disentangling the causes of package hallucinations is complex. Several factors likely influence the rate and nature of hallucinations, including model size, data quality, training objectives, fine-tuning practices, and safety constraints. Further investigation is needed to determine:
- How training data composition and licensing affect dependency recall accuracy.
- The impact of different instruction-following strategies and safety policies on the prevalence of hallucinations.
- Whether enhancements in retrieval-augmented generation or external knowledge integration reduce erroneous package references.
- The role of ecosystem characteristics (e.g., package volume, naming conventions, namespace design) in shaping hallucination rates.
- Practical, scalable defenses that can be integrated into CI/CD pipelines to detect and prevent dependency misalignment before deployment.
Researchers emphasize that the goal is not to discourage AI-assisted coding but to advance methods that ensure reliability, traceability, and security when AI tools contribute to software development.
Conclusion
The study provides a critical, data-backed view of a tangible risk in AI-assisted software development: AI-generated code frequently references non-existent dependencies, creating a viable pathway for supply-chain attacks through dependency confusion. The findings show not only that hallucinations occur at notable rates, but also that many hallucinated dependencies recur across multiple queries, offering attackers repeatable targets. The gap between open-source and commercial models, along with language-specific differences, underscores the complexity of mitigating these risks in a diverse, rapidly evolving ecosystem.
For developers and organizations, the takeaway is clear: integrate AI-assisted coding with rigorous validation, dependency verification, and robust security practices. By combining human oversight, automated scanning, and disciplined governance of AI tools, teams can harness the benefits of AI-generated code while mitigating the associated security risks. As AI continues to reshape software development, proactive, defense-forward strategies will be essential to protect the integrity of the software supply chain and the trust of downstream users.