LLM05: Supply Chain Vulnerabilities
Supply Chain Vulnerabilities
The supply chain in LLMs can be vulnerable, impacting the integrity of training data, ML models, and deployment platforms. These vulnerabilities can lead to biased outcomes, security breaches, or even complete system failures. Traditionally, vulnerabilities are focused on software components, but Machine Learning extends this with the pre-trained models and training data supplied by third parties susceptible to tampering and poisoning attacks.
Finally, LLM Plugin extensions can bring their own vulnerabilities. These are described in LLM - Insecure Plugin Design, which covers writing LLM Plugins and provides information useful to evaluate third-party plugins.
Common Examples of Vulnerability
Traditional third-party package vulnerabilities, including outdated or deprecated components.
Using a vulnerable pre-trained model for fine-tuning.
Use of poisoned crowd-sourced data for training.
Using outdated or deprecated models that are no longer maintained leads to security issues.
Unclear T&Cs and data privacy policies of the model operators lead to the application’s sensitive data being used for model training and subsequent sensitive information exposure. This may also apply to risks from using copyrighted material by the model supplier.
How to Prevent
Carefully vet data sources and suppliers, including T&Cs and their privacy policies, only using trusted suppliers. Ensure adequate and independently-audited security is in place and that model operator policies align with your data protection policies.
Only use reputable plug-ins and ensure they have been tested for your application requirements. LLM-Insecure Plugin Design provides information on the LLM-aspects of Insecure Plugin design you should test against to mitigate risks from using third-party plugins.
Understand and apply the mitigations found in the OWASP Top Ten’s A06:2021 — Vulnerable and Outdated Components.
Maintain an up-to-date inventory of components using a Software Bill of Materials (SBOM) to ensure you have an up-to-date, accurate, and signed inventory preventing tampering with deployed packages.
Use MLOps best practices and platforms offering secure model repositories with data, model, and experiment tracking if your LLM application uses its own model.
Use model and code signing when using external models and suppliers.
Implement anomaly detection and adversarial robustness tests on supplied models and data.
Implement sufficient monitoring to cover component and environment vulnerabilities scanning, use of unauthorized plugins, and out-of-date components, including the model and its artifacts.
Implement a patching policy to mitigate vulnerable or outdated components.
Regularly review and audit supplier Security and Access, ensuring no changes in their security posture or T&Cs.
Example Attack Scenarios
An attacker exploits a vulnerable Python library to compromise a system.
An attacker provides an LLM plugin to search for flights which generates fake links that lead to scamming plugin users.
An attacker exploits the PyPi package registry to trick model developers into downloading a compromised package and exfiltrating data or escalating privilege in a model development environment.
An attacker poisons a publicly available pre-trained model specializing in economic analysis and social research to create a backdoor which generates misinformation and fake news.
An attacker poisons publicly available dataset to help create a backdoor when fine-tuning models.
A compromised employee of a supplier exfiltrates data, model, or code stealing IP.
An LLM operator changes its T&Cs and Privacy Policy so that it requires an explicit opt-out from using application data for model training, leading to memorization of sensitive data.
Reference Links
Last updated