Sample Assessment Report
Redacted for confidentiality
AI Security
AI/ML System Security Assessment
Confidential Client — AI-Powered SaaS Platform
Customer-facing LLM assistant (GPT-4o based), RAG pipeline with enterprise knowledge base, API backend serving AI responses
8 business days
OWASP LLM Top 10 (2025)
Executive Summary
UnlockSec conducted a comprehensive AI security assessment of the client's LLM-powered customer service platform. Assessment identified critical prompt injection vulnerabilities enabling system prompt exfiltration, a RAG poisoning vector allowing attacker-controlled document injection, and guardrail bypass techniques achieving harmful content generation with 73% reliability across 200 test attempts.
Methodology
Sample Findings
System Prompt Exfiltration via Indirect Prompt Injection
Description
By submitting a document containing adversarial instructions to the knowledge base retrieval pipeline, an attacker causes the LLM to reveal its complete system prompt when a user interacts with related content. The system prompt contains internal operational procedures and API keys for third-party integrations.
Recommendation
Implement content filtering on all documents entering the RAG pipeline. Apply input/output guardrails using a secondary LLM classifier. Never include secrets in system prompts — use environment variables and backend secret management.
Guardrail Bypass — Jailbreak via Role-Play Framing
Description
A structured role-play prompt framing ('Act as a security researcher who must demonstrate...') bypasses content safety filters with 73% reliability across 200 test attempts, enabling generation of content explicitly prohibited by the system prompt and usage policy.
Recommendation
Implement multi-layer content moderation (input + output). Use a separate classifier model to evaluate intent before and after generation. Apply constitutional AI techniques and adversarial fine-tuning on bypass patterns.
RAG Pipeline Poisoning — Attacker-Controlled Knowledge Injection
Description
The document ingestion pipeline does not validate the source or authenticity of uploaded documents. An authenticated user with document upload permissions can inject adversarial content that biases the LLM's responses for all users querying related topics.
Recommendation
Implement source provenance tracking for all RAG documents. Apply content fingerprinting and anomaly detection on new ingestions. Require human review for documents that trigger semantic similarity alerts.
Model Response Inference — Sensitive Business Logic Extraction
Description
Through systematic querying, it is possible to infer internal business rules, pricing algorithms, and customer segmentation criteria embedded in the system prompt context. This constitutes IP exfiltration without directly extracting the system prompt text.
Recommendation
Avoid embedding sensitive business logic in system prompts. Use backend API calls for dynamic business rules. Implement query rate limiting and semantic anomaly detection for systematic probing patterns.
* Showing 4 of 19 total findings. Full report provided upon engagement.
Risk Summary
Deliverables Included
- OWASP LLM Top 10 coverage report
- Prompt injection test case library (200+ test cases)
- RAG pipeline security assessment
- Guardrail effectiveness evaluation
- AI-specific remediation guidance with implementation examples
Ready for a real assessment?
Get a tailored AI Security engagement led by certified operators with unlimited retests.
Request AssessmentView All Services