backtest-comparatorlisted

Compare strategy variants across multiple years or folds and flag the ones that only look good on average - rank by mean fold score but surface dispersion and worst-fold so a variant that overfits one lucky period does not win silently. Use for compare backtest variants, flag overfit strategy, per-fold consistency, year by year backtest, or which parameter set is robust.
barobaonguyen/ai-automation-skills · ★ 0 · Testing & QA · score 75

Install: claude install-skill barobaonguyen/ai-automation-skills

# Backtest Comparator Use this skill when several strategy variants or parameter sets each have per-year (or per-fold) results and you need to pick the robust one, not the one with the highest average. A variant can win on the mean while quietly losing on an individual fold; this comparator ranks by mean but also reports standard deviation and the worst fold, then flags overfit and unstable variants so they can't slip through. ## When to invoke - User says: "compare these backtest variants" / "which parameter set is robust" / "flag the overfit one" / "year-by-year results" - Code in the conversation uses: a sweep that produced per-fold or per-year scores for multiple variants. ## When NOT to invoke - You only have a single aggregate number per variant (no per-fold breakdown to judge consistency). - The task is to build the validation split itself (use [[walk-forward-runner]] first, then compare its folds here). ## Concrete example User input: ```text Three variants, four years of returns each. Tell me which is actually robust vs which just got lucky one year. ``` Output: ```text variant mean std worst verdict ------------------------------------------------------ cross_only 0.188 0.027 0.150 robust macd_filter 0.095 0.011 0.080 robust pullback_ema21 0.075 0.205 -0.120 OVERFIT (positive mean, loses on a fold) ``` Code: ```python # Copy assets/compare.py into your project, then: from compare import comp