modallisted

- User is deploying ML/AI workloads to Modal (modal.com) - Code imports `modal` or references `@app.function`, `@app.cls`, `Stub`, `App` - User needs serverless GPU compute for inference, fine-tuning,
Claudient/Claudient · ★ 4 · AI & Automation · score 65

Install: claude install-skill Claudient/Claudient

# Modal ## When to activate - User is deploying ML/AI workloads to Modal (modal.com) - Code imports `modal` or references `@app.function`, `@app.cls`, `Stub`, `App` - User needs serverless GPU compute for inference, fine-tuning, or batch ML jobs - User asks about cold starts, GPU selection, model weight caching, or streaming inference on Modal - User is building a web endpoint or scheduled job backed by GPU compute ## When NOT to use - User is deploying to a different serverless GPU platform (RunPod, Lambda Labs, Replicate, Banana) - Task is CPU-only and has no ML workload — use standard serverless (Lambda, Cloud Run) - User already has a managed inference endpoint (OpenAI, Anthropic, Together) and just needs an API call - The question is about ML model architecture, not infrastructure ## Instructions ### App and Image setup Modal's entry point is an `App` (formerly `Stub`). Every deployed function belongs to one. ```python import modal app = modal.App("my-inference-app") # Build a custom image — do this once, it is cached layer by layer image = ( modal.Image.debian_slim(python_version="3.11") .pip_install(["torch", "transformers", "accelerate", "vllm"]) .env({"HF_HUB_ENABLE_HF_TRANSFER": "1"}) ) ``` Build heavy dependencies into the image, not at function startup. Each `.pip_install()` call is a separate Docker layer and is cached independently. ### GPU selection | GPU | VRAM | Use when | |-----|------|----------| | T4 | 16 GB | Dev/testing, small mode