Luis247911
UserLauffähiges, domain-neutrales Starter-Harness für AI-Projekte (zuerst Claude Code). Zwei Schichten: Governance (Markdown-Regeln, State, Knowledge-Graph in .ai-workspace/) und eine ausführbare Schicht aus 12 Claude-Code-Skills über einer pip-installierbaren Python-Engine (src/harness/). Stdlib-first, mit Anti-Sprawl-Regeln.
Categories
Indexed Skills (12)
agent-pattern-selector
Use this when you have an agent problem but are not sure which approach or skill applies — to triage from a one-line symptom to the right harness skill, and to be reminded that the simplest thing that works usually wins. Triggers on "where do I start", "which pattern", "what should I use", "how do I build this agent", "is this the right approach".
cost-latency-optimizer
Use this when an agent is too slow or too expensive — to apply the four levers (prompt caching, batching, streaming, model routing) in the right order and lint a cache-breakpoint plan for the mistakes that silently bust the cache. Triggers on "too expensive", "latency", "speed up", "cut cost", "prompt caching", "which model", "routing".
eval-judge
Use this when you need to grade a single open-ended output against a rubric (LLM-as-judge) — to get a PASS/FAIL with a score, deterministically offline and with a real model when live. Triggers on "LLM as judge", "grade this", "rubric", "score the answer", "is this output good", "judge".
eval-loop-builder
Use this when you need to build or extend an evaluation for an agent or prompt — to turn a vague "it seems better" into a dataset, weighted assertions, and a threshold gate that fails CI. Triggers on "eval", "test the prompt", "regression", "is the new prompt better", "scorecard".
guardrail-designer
Use this when you need to validate or sanitize what crosses the agent boundary at runtime — to block PII, enforce a format, cap length, or refuse unsafe output, with an explicit on-fail action. Triggers on "guardrail", "validation", "PII", "block", "sanitize", "filter output", "content policy".
hitl-gate
Use this when an agent must pause for human approval before a risky or irreversible action — to persist an interrupt, wait across process restarts, then resume on approve or loop back with feedback on deny. Triggers on "human in the loop", "approval", "confirm before", "pause for review", "gate the action".
memory-architect
Use this when an agent needs to remember things across turns or sessions — to choose the right memory type (working, factual, episodic, semantic) and scope (conversation, session, user, org), and to move items between in-context and archival storage. Triggers on "memory", "remember", "long-term", "persist context", "recall", "what should the agent store".
multi-agent-topology
Use this when you are tempted to build a multi-agent system — to first decide whether you need more than one agent at all, and if so which topology (supervisor, hierarchical, network, swarm) fits, with the failure modes of each. Triggers on "multi-agent", "multiple agents", "swarm", "supervisor agent", "agents talking to each other", "how many agents".
observability-tracer
Use this when you need to see what an agent did — to emit a tree of typed spans (llm_call, tool_call, retrieval, agent, chain) with gen_ai.* attributes and export them as JSONL, with message content captured only when you opt in. Triggers on "tracing", "observability", "spans", "what did the agent do", "token usage", "debug the agent run".
orchestrator-patterns
Use this when you need to structure an agent workflow — to pick among the proven shapes (prompt chaining, routing, parallel sectioning, voting, orchestrator-workers, evaluator-optimizer) plus the ReAct loop, and run each as working code. Triggers on "workflow", "chain steps", "orchestrate", "parallelize", "evaluator", "agent loop", "ReAct".
skill-author
Use this when you are creating or editing a skill for this harness — to follow the frontmatter contract, write an evals-first SKILL.md, and lint it before committing so it loads cleanly and stays domain-neutral. Triggers on "write a skill", "new skill", "SKILL.md", "author a skill", "skill frontmatter", "lint my skill".
skill-supply-chain-check
Use this before running a third-party or unfamiliar skill — to scan its executable scripts for supply-chain risk (shelling out, outbound network, installs, embedded secrets, dynamic exec) and get a severity-ranked report. Triggers on "is this skill safe", "audit this skill", "supply chain", "vet the skill", "before I install".
Bio shown is the top-scored skill's repo description as a fallback — real GitHub bios land in a future update.