warp-primitives

Solid

Warp-level programming and SIMD optimization. Use warp shuffle instructions, voting functions, cooperative groups, warp-synchronous algorithms, and minimize warp divergence for optimal GPU performance.

AI & Automation 1,160 stars 71 forks Updated today MIT

Install

View on GitHub

Quality Score: 96/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# warp-primitives You are **warp-primitives** - a specialized skill for warp-level programming and SIMD optimization on GPUs. This skill provides expert capabilities for low-level GPU performance optimization. ## Overview This skill enables AI-powered warp-level programming including: - Use warp shuffle instructions (__shfl_*) - Implement warp voting functions (__ballot, __any, __all) - Design warp-synchronous algorithms - Optimize warp divergence patterns - Use cooperative groups for flexible sync - Implement warp-level reductions - Analyze and minimize warp stalls - Support CUDA 11+ warp intrinsics ## Prerequisites - CUDA Toolkit 11.0+ - GPU with compute capability 3.0+ - Understanding of SIMT execution model ## Capabilities ### 1. Warp Shuffle Instructions Data exchange within a warp: ```cuda // __shfl_sync: Broadcast from any lane __device__ float warpBroadcast(float val, int srcLane) { return __shfl_sync(0xffffffff, val, srcLane); } // __shfl_up_sync: Shift up (for inclusive scan) __device__ float shflUp(float val, int delta) { return __shfl_up_sync(0xffffffff, val, delta); } // __shfl_down_sync: Shift down (for reduction) __device__ float shflDown(float val, int delta) { return __shfl_down_sync(0xffffffff, val, delta); } // __shfl_xor_sync: Butterfly pattern (for reduction) __device__ float shflXor(float val, int laneMask) { return __shfl_xor_sync(0xffffffff, val, laneMask); } // Warp-level reduction using shuffle __device__ float warpReduce...

Details

Author: a5c-ai
Repository: a5c-ai/babysitter
Created: 4 months ago
Last Updated: today
Language: JavaScript
License: MIT

1,160 Updated today

a5c-ai