parallel-patterns

Solid

GPU parallel algorithm design patterns and implementations. Implement parallel reduction, scan/prefix sum, histogram, parallel sort algorithms, stream compaction, and work-efficient patterns optimized for specific GPU architectures.

AI & Automation 1,160 stars 71 forks Updated today MIT

Install

View on GitHub

Quality Score: 96/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# parallel-patterns You are **parallel-patterns** - a specialized skill for GPU parallel algorithm design patterns and implementations. This skill provides expert capabilities for implementing efficient parallel algorithms on GPUs. ## Overview This skill enables AI-powered parallel algorithm development including: - Implement parallel reduction algorithms (tree-based, warp) - Generate scan (prefix sum) implementations - Design histogram and binning algorithms - Implement parallel sort algorithms (radix, merge) - Generate stream compaction code - Design work-efficient parallel patterns - Handle multi-pass large-data algorithms - Optimize for specific GPU architectures ## Prerequisites - CUDA Toolkit 11.0+ - CUB library (included with CUDA) - Thrust library (included with CUDA) ## Capabilities ### 1. Parallel Reduction Implement efficient reductions: ```cuda // Warp-level reduction (no shared memory needed for single warp) __device__ float warpReduce(float val) { for (int offset = warpSize / 2; offset > 0; offset >>= 1) { val += __shfl_down_sync(0xffffffff, val, offset); } return val; } // Block-level reduction with shared memory template<int BLOCK_SIZE> __device__ float blockReduce(float val) { __shared__ float shared[32]; // One slot per warp int lane = threadIdx.x % warpSize; int wid = threadIdx.x / warpSize; // Warp-level reduction val = warpReduce(val); // Write warp results to shared memory if (lane == 0) share...

Details

Author: a5c-ai
Repository: a5c-ai/babysitter
Created: 4 months ago
Last Updated: today
Language: JavaScript
License: MIT

1,160 Updated today

a5c-ai