evidence-calibration-review

Solid

Use when you want a per-claim evidence-tier audit on a text artifact before it ships — assign T1-T6 tiers to every load-bearing claim, surface calibration mismatches (high confidence on weak evidence, or honesty-theater under-claiming), and flag P11 (citation-as-decoration), P17 (pile-of-anecdotes-as-evidence), P54 (unverifiable single-source) patterns. Encodes the Evidence & Calibration deliberator role from the agent-council 5-perspective quality gate. Use standalone for fast evidence audit, or compose with the other 4 deliberator skills.

Code & Development 9 stars 1 forks Updated 1 weeks ago MIT

Install

View on GitHub

Quality Score: 83/100

Stars 20%

Recency 20%

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

## Purpose Run a per-claim evidence-tier audit on a text artifact before it ships. The Evidence & Calibration role reads the artifact claim by claim and asks one question per claim: **what tier of evidence supports it, and is the artifact's stated confidence consistent with that tier?** A claim asserted with high confidence on Tier 6 (inferred) evidence is a calibration failure. A claim hedged with "perhaps" when the evidence is Tier 1 (primary source, verified) is also a calibration failure — under-claiming is its own honesty failure. The skill catches both directions. This is the boring and the load-bearing role on the panel. "Where is the source for X?" is the question that ends careers. Evidence & Calibration surfaces every unsourced claim before it ships. The skill encodes the Evidence & Calibration role from the `agent-council` 5-deliberator quality gate. Use standalone for fast evidence audit, or compose with the other 4 deliberator skills for fuller coverage. ## When to Use / When NOT to Use **Use this skill when:** - A claim-dense artifact (analysis, memo, public pitch) is about to ship and you want every load-bearing claim tiered - You suspect over-claiming (high confidence on weak evidence) or under-claiming (hedging what is actually verified) and want both directions surfaced - A piece relies on attributions ("X said Y" / "Microsoft did Z") and you want each verified or hedged appropriately - You need to catch P11 (citation-as-decoration), P17 (pile-of-anecd...

Details

Author: Avyayalaya
Repository: Avyayalaya/agent-council
Created: 2 months ago
Last Updated: 1 weeks ago
Language: Python
License: MIT

Bundled in these plugins

agent-council

Similar Skills

Semantically similar based on skill content — not just same category

Code & Development Listed

evidence-auditor

Power skill that confirms every claim tagged as verified in a lifecycle artifact cites a resolvable evidence source.

0 Updated today

JeelVankhede

Code & Development Solid

docs-claim-check

Check whether the claims in public-facing documentation (README, release notes, install/usage docs) are supported by the evidence the user provides — files, manifests, logs, and command outputs supplied in the conversation. Produces per-claim findings with a confidence label (verified / unsupported / stale-suspected / needs-human) and an explicit "input scope reviewed" statement. Advisory only. Use when the user asks to fact-check docs, verify a README against a repo, audit release notes, or find stale or overstated documentation claims. Do NOT use for standalone code review or bug hunting, security audits, fix/patch generation, or pure command-execution tasks. When such requests are mixed with an eligible claim-check, still use this skill for the claim-check portion and decline only the out-of-scope part — by contract it does not execute commands or edit files.

66 Updated today

kyungseo

Code & Development Listed

evidence-and-claims-standard

Adjudicate a material or disputed factual, comparative, SOTA/frontier, status, completion, causality, or delivery claim against current primary evidence. Use when deciding whether work is actually done, a claimed improvement or leading position is supported, a stated cause is proven, or source, merge, release, deploy, and live behavior are established. Do not load merely because routine implementation reporting should remain truthful.

1 Updated today

SylphxAI