constitutional-ai-prompts

Solid

Constitutional AI and safety guardrail prompts for aligned LLM behavior

AI & Automation 1,160 stars 71 forks Updated today MIT

Install

View on GitHub

Quality Score: 94/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
54
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Constitutional AI Prompts Skill ## Capabilities - Design constitutional AI principles - Implement self-critique and revision prompts - Create harmlessness guidelines - Design refusal patterns for unsafe requests - Implement red-team testing prompts - Create ethics-aware response frameworks ## Target Processes - system-prompt-guardrails - content-moderation-safety ## Implementation Details ### Constitutional Patterns 1. **Critique-Revision**: Self-evaluate and improve responses 2. **Principle Adherence**: Follow defined ethical principles 3. **Harmlessness Focus**: Prioritize safe responses 4. **Helpfulness Balance**: Balance helpfulness with safety 5. **Transparency**: Acknowledge limitations ### Configuration Options - Constitutional principles list - Critique prompts - Revision guidelines - Refusal templates - Escalation triggers ### Best Practices - Define clear constitutional principles - Balance helpfulness and safety - Test with adversarial inputs - Document refusal patterns - Regular principle review ### Dependencies - langchain-core

Details

Author
a5c-ai
Repository
a5c-ai/babysitter
Created
4 months ago
Last Updated
today
Language
JavaScript
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Featured

constitutional-ai

Anthropic's method for training harmless AI through self-improvement. Two-phase approach - supervised learning with self-critique/revision, then RLAIF (RL from AI Feedback). Use for safety alignment, reducing harmful outputs without human labels. Powers Claude's safety system.

27,705 Updated today
davila7
AI & Automation Solid

constitutional-ai

Anthropic's method for training harmless AI through self-improvement. Two-phase approach - supervised learning with self-critique/revision, then RLAIF (RL from AI Feedback). Use for safety alignment, reducing harmful outputs without human labels. Powers Claude's safety system.

9,182 Updated 1 months ago
Orchestra-Research
Web & Frontend Listed

constitutional-reasoning

Self-critique and Constitutional AI reasoning skill. Makes Claude evaluate its own outputs against a set of user-defined or auto-generated principles, then revise until the output satisfies all of them. Reduces hallucination, over-confidence, and sycophancy by forcing Claude to argue against its own answer before finalising. Generates a principle set from the user's domain, runs critique passes, surfaces violations, revises, and repeats until no principles are violated or the user accepts the output. Use when user says: critique your own answer, check yourself, apply your principles, constitutional AI, self-review, fact-check this, argue against your own output, steelman the opposite, what are you getting wrong, is this actually correct, audit your answer, find your own mistakes, what assumptions are you making, reduce hallucination, double-check yourself, run a critique pass, apply a rubric. Do NOT activate for: creative work where principles would suppress quality, requests that explicitly want a single con

2 Updated 6 days ago
Sandeeprdy1729
AI & Automation Solid

ai-prompt-engineering-safety-review

Comprehensive AI prompt engineering safety review and improvement prompt. Analyzes prompts for safety, bias, security vulnerabilities, and effectiveness while providing detailed improvement recommendations with extensive frameworks, testing methodologies, and educational content.

34,233 Updated today
github
AI & Automation Listed

ai-llm-safety

This skill should be used when designing, planning, implementing, or reviewing any system that involves LLM agents, tool use, prompt construction, or agentic workflows, or when the user asks to "add guardrails", "prevent prompt injection", "sanitize LLM output" — enforces prompt injection defense, tool safety, and context integrity

5 Updated today
alo-exp