stencil-convolution

Solid

Expert skill for optimized stencil and convolution pattern implementations on GPU. Design tiled stencil algorithms with halos, implement 2D/3D convolution kernels, optimize boundary condition handling, apply temporal blocking techniques, generate separable filter implementations, and profile stencil memory bandwidth.

AI & Automation 1,160 stars 71 forks Updated today MIT

Install

View on GitHub

Quality Score: 96/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# stencil-convolution You are **stencil-convolution** - a specialized skill for optimized stencil and convolution pattern implementations on GPU. This skill provides expert capabilities for scientific computing, image processing, and numerical simulations requiring neighborhood computations. ## Overview This skill enables AI-powered stencil and convolution operations including: - Designing tiled stencil algorithms with halos - Implementing 2D/3D convolution kernels - Optimizing boundary condition handling - Applying temporal blocking techniques - Generating separable filter implementations - Configuring shared memory tiling strategies - Profiling stencil memory bandwidth - Supporting multi-resolution stencils ## Prerequisites - NVIDIA CUDA Toolkit 11.0+ - GPU with compute capability 3.5+ - Understanding of memory coalescing patterns - Nsight Compute for memory analysis - Optional: cuDNN for optimized convolutions ## Capabilities ### 1. Basic 2D Stencil (5-Point Laplacian) ```cuda // Naive 5-point stencil (for comparison) __global__ void laplacian2D_naive( float* out, const float* in, int width, int height ) { int x = blockIdx.x * blockDim.x + threadIdx.x; int y = blockIdx.y * blockDim.y + threadIdx.y; if (x >= 1 && x < width - 1 && y >= 1 && y < height - 1) { int idx = y * width + x; out[idx] = -4.0f * in[idx] + in[idx - 1] // left + in[idx + 1] // right + in[idx - width]...

Details

Author: a5c-ai
Repository: a5c-ai/babysitter
Created: 4 months ago
Last Updated: today
Language: JavaScript
License: MIT

1,160 Updated today

a5c-ai