gpu-memory-analysis

Solid

Specialized skill for GPU memory hierarchy analysis and optimization. Analyze memory access patterns, detect bank conflicts, optimize cache utilization, profile global memory bandwidth, and generate optimized memory access code patterns.

AI & Automation 1,160 stars 71 forks Updated today MIT

Install

View on GitHub

Quality Score: 96/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# gpu-memory-analysis You are **gpu-memory-analysis** - a specialized skill for GPU memory hierarchy analysis and optimization. This skill provides expert capabilities for understanding and optimizing GPU memory access patterns. ## Overview This skill enables AI-powered GPU memory optimization including: - Analyze memory access patterns (coalescing, striding) - Detect and resolve shared memory bank conflicts - Optimize L1/L2 cache utilization - Configure shared memory vs L1 cache partitioning - Analyze texture and constant memory usage - Profile global memory bandwidth utilization - Identify unnecessary memory transactions - Generate optimized memory access code patterns ## Prerequisites - CUDA Toolkit 11.0+ - Nsight Compute (for memory profiling) - compute-sanitizer (for memory validation) ## Capabilities ### 1. Memory Access Pattern Analysis Analyze coalescing and striding: ```cuda // Good: Coalesced access (threads access consecutive addresses) __global__ void coalescedAccess(float* data, int n) { int idx = blockIdx.x * blockDim.x + threadIdx.x; if (idx < n) { float val = data[idx]; // Coalesced: thread i accesses data[i] data[idx] = val * 2.0f; } } // Bad: Strided access (cache unfriendly) __global__ void stridedAccess(float* data, int n, int stride) { int idx = blockIdx.x * blockDim.x + threadIdx.x; int actualIdx = idx * stride; // Non-coalesced! if (actualIdx < n) { float val = data[actualIdx]; data[act...

Details

Author
a5c-ai
Repository
a5c-ai/babysitter
Created
4 months ago
Last Updated
today
Language
JavaScript
License
MIT

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

memory-analysis

Embedded memory analysis, optimization, and leak detection

1,160 Updated today
a5c-ai
AI & Automation Solid

cuda-debugging

Expert skill for GPU debugging using CUDA-GDB and NVIDIA Compute Sanitizer. Detect memory errors, race conditions, uninitialized memory access, validate atomic operations, analyze kernel synchronization issues, and generate debugging reports with recommendations.

1,160 Updated today
a5c-ai
AI & Automation Solid

unified-memory

Expert skill for CUDA Unified Memory and memory prefetching optimization. Configure managed memory allocations, implement memory prefetch strategies, handle page fault analysis, configure memory hints and advise, profile unified memory migration, optimize for oversubscription scenarios, and compare managed vs explicit memory.

1,160 Updated today
a5c-ai
AI & Automation Solid

parallel-patterns

GPU parallel algorithm design patterns and implementations. Implement parallel reduction, scan/prefix sum, histogram, parallel sort algorithms, stream compaction, and work-efficient patterns optimized for specific GPU architectures.

1,160 Updated today
a5c-ai
AI & Automation Solid

gpu-benchmarking

Expert skill for automated GPU performance benchmarking and regression detection. Design micro-benchmarks, measure kernel execution time with CUDA events, calculate achieved vs theoretical performance, generate comparison reports, detect regressions in CI/CD, and profile power/thermal characteristics.

1,160 Updated today
a5c-ai