stencil-convolution

Solid

Expert skill for optimized stencil and convolution pattern implementations on GPU. Design tiled stencil algorithms with halos, implement 2D/3D convolution kernels, optimize boundary condition handling, apply temporal blocking techniques, generate separable filter implementations, and profile stencil memory bandwidth.

AI & Automation 1,160 stars 71 forks Updated today MIT

Install

View on GitHub

Quality Score: 96/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# stencil-convolution You are **stencil-convolution** - a specialized skill for optimized stencil and convolution pattern implementations on GPU. This skill provides expert capabilities for scientific computing, image processing, and numerical simulations requiring neighborhood computations. ## Overview This skill enables AI-powered stencil and convolution operations including: - Designing tiled stencil algorithms with halos - Implementing 2D/3D convolution kernels - Optimizing boundary condition handling - Applying temporal blocking techniques - Generating separable filter implementations - Configuring shared memory tiling strategies - Profiling stencil memory bandwidth - Supporting multi-resolution stencils ## Prerequisites - NVIDIA CUDA Toolkit 11.0+ - GPU with compute capability 3.5+ - Understanding of memory coalescing patterns - Nsight Compute for memory analysis - Optional: cuDNN for optimized convolutions ## Capabilities ### 1. Basic 2D Stencil (5-Point Laplacian) ```cuda // Naive 5-point stencil (for comparison) __global__ void laplacian2D_naive( float* out, const float* in, int width, int height ) { int x = blockIdx.x * blockDim.x + threadIdx.x; int y = blockIdx.y * blockDim.y + threadIdx.y; if (x >= 1 && x < width - 1 && y >= 1 && y < height - 1) { int idx = y * width + x; out[idx] = -4.0f * in[idx] + in[idx - 1] // left + in[idx + 1] // right + in[idx - width]...

Details

Author
a5c-ai
Repository
a5c-ai/babysitter
Created
4 months ago
Last Updated
today
Language
JavaScript
License
MIT

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

parallel-patterns

GPU parallel algorithm design patterns and implementations. Implement parallel reduction, scan/prefix sum, histogram, parallel sort algorithms, stream compaction, and work-efficient patterns optimized for specific GPU architectures.

1,160 Updated today
a5c-ai
AI & Automation Solid

cuda-toolkit

Deep integration with NVIDIA CUDA toolkit for kernel development, compilation, and debugging. Execute nvcc compilation with optimization flags analysis, generate and validate CUDA kernel code, analyze PTX/SASS assembly output, and configure execution parameters.

1,160 Updated today
a5c-ai
AI & Automation Solid

cublas-cudnn

Expert integration with NVIDIA GPU-accelerated math libraries. Configure cuBLAS tensor core operations, generate cuBLAS GEMM calls, integrate cuDNN layers, handle algorithm selection, and support mixed-precision operations.

1,160 Updated today
a5c-ai
AI & Automation Solid

gpu-memory-analysis

Specialized skill for GPU memory hierarchy analysis and optimization. Analyze memory access patterns, detect bank conflicts, optimize cache utilization, profile global memory bandwidth, and generate optimized memory access code patterns.

1,160 Updated today
a5c-ai
AI & Automation Solid

cuda-graphs

Expert skill for CUDA Graph capture and optimization for reduced launch overhead. Capture CUDA operations into graphs, instantiate and execute graph instances, update graph node parameters, profile graph vs stream execution, design graph-friendly kernel patterns, and optimize launch latency for inference.

1,160 Updated today
a5c-ai