implementing-llm-guardrails-for-security
FeaturedImplements input and output validation guardrails for LLM-powered applications to prevent prompt injection, data leakage, toxic content generation, and hallucinated outputs. Builds a security validation pipeline using NVIDIA NeMo Guardrails Colang definitions, custom Python validators for PII detection and content policy enforcement, and the Guardrails AI framework for structured output validation. The guardrails system intercepts both user inputs (blocking injection attempts, stripping PII, enforcing topic boundaries) and model outputs (detecting hallucinations, filtering toxic content, validating JSON schema compliance). Activates for requests involving LLM output validation, AI content filtering, guardrail implementation, or LLM safety enforcement.
Install
Quality Score: 99/100
Skill Content
Details
- Author
- mukul975
- Repository
- mukul975/Anthropic-Cybersecurity-Skills
- Created
- 3 months ago
- Last Updated
- today
- Language
- Python
- License
- Apache-2.0
Integrates with
Similar Skills
Semantically similar based on skill content — not just same category
guardrails-ai-setup
Guardrails AI validation framework setup for LLM applications. Implement input/output validation, safety checks, and structured output enforcement.
nemo-guardrails
NVIDIA's runtime safety framework for LLM applications. Features jailbreak detection, input/output validation, fact-checking, hallucination detection, PII filtering, toxicity detection. Uses Colang 2.0 DSL for programmable rails. Production-ready, runs on T4 GPU.
nemo-guardrails
NVIDIA's runtime safety framework for LLM applications. Features jailbreak detection, input/output validation, fact-checking, hallucination detection, PII filtering, toxicity detection. Uses Colang 2.0 DSL for programmable rails. Production-ready, runs on T4 GPU.
nemo-guardrails
NVIDIA NeMo Guardrails configuration for conversational safety and control
llamaguard
Meta's 7-8B specialized moderation model for LLM input/output filtering. 6 safety categories - violence/hate, sexual content, weapons, substances, self-harm, criminal planning. 94-95% accuracy. Deploy with vLLM, HuggingFace, Sagemaker. Integrates with NeMo Guardrails.