← ClaudeAtlas

detecting-tips-zoneslisted

Text-prompted image zone detection using TIPSv2 B/14 on CPU. Produces `focus_targets` / `focus_edges` bbox lists from natural-language labels, ready to feed into `svg-portrait-mode`. Use when you want automatic foreground/background separation from prompts like "dog face" + "wooden floor" instead of hand-annotating bboxes.
oaustegard/claude-skills · ★ 124 · AI & Automation · score 84
Install: claude install-skill oaustegard/claude-skills
# Detecting TIPS Zones Zero-shot zone detection: text prompts → patch-grid cosine heatmaps → bboxes. Companion to `svg-portrait-mode` — replaces manual `focus_targets` / `focus_edges` annotation with a TIPSv2 B/14 forward pass. ## Quick Start ```python from tips_zones import detect_zones from portrait_mode import portrait_mode focus_targets, focus_edges = detect_zones( "photo.jpg", targets=["dog face"], edges=["dog paws", "dog ears", "dog body"], distractors=["wooden floor", "carpet rug", "shoes", "wall"], ckpt_dir="/path/to/tips/checkpoints", tips_root="/path/to/tips", ) svg, stats = portrait_mode( "photo.jpg", focus_targets=focus_targets, focus_edges=focus_edges, style_transforms={"background": "desaturate:0.7"}, ) ``` Amortise model load across multiple images: ```python from tips_zones import load_models, detect_zones models = load_models(ckpt_dir, tips_root, device="cpu") for img in images: ft, fe = detect_zones(img, targets=[...], edges=[...], distractors=[...], ckpt_dir=ckpt_dir, tips_root=tips_root, models=models) ... ``` ## How It Works ``` image → B/14 vision encoder (MaskCLIP values trick on last block) → (32×32 patch grid at 448, or 64×64 at 896) × 768-d patch features text labels → prompt ensemble (9 TCL templates) → B/14 text encoder → per-label mean feature → L2-normalise per-label heatmap = cos(patch feature, label feature) # raw, no softmax bbox = top-k% patches → l