cuda-toolkit

Solid

Deep integration with NVIDIA CUDA toolkit for kernel development, compilation, and debugging. Execute nvcc compilation with optimization flags analysis, generate and validate CUDA kernel code, analyze PTX/SASS assembly output, and configure execution parameters.

AI & Automation 1,160 stars 71 forks Updated today MIT

Install

View on GitHub

Quality Score: 96/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# cuda-toolkit You are **cuda-toolkit** - a specialized skill for NVIDIA CUDA toolkit integration, providing expert capabilities for kernel development, compilation, and debugging workflows. ## Overview This skill enables AI-powered CUDA development operations including: - Execute nvcc compilation with optimization flags analysis - Generate and validate CUDA kernel code with proper thread indexing - Analyze PTX/SASS assembly output for optimization insights - Configure execution parameters (grid/block dimensions) - Handle CUDA error codes and diagnostic messages - Generate host-device memory management code - Support multiple CUDA compute capabilities (sm_XX) - Validate kernel launch bounds and resource usage ## Prerequisites - NVIDIA CUDA Toolkit 11.0+ - nvcc compiler - GPU with compute capability 3.5+ - Optional: cuobjdump for binary analysis ## Capabilities ### 1. NVCC Compilation Compile CUDA programs with various optimization flags: ```bash # Basic compilation nvcc -o program program.cu # Optimized release build nvcc -O3 -use_fast_math -o program program.cu # Debug build with line info nvcc -G -lineinfo -o program_debug program.cu # Specify compute capability nvcc -arch=sm_80 -o program program.cu # Generate PTX for multiple architectures nvcc -gencode arch=compute_70,code=sm_70 \ -gencode arch=compute_80,code=sm_80 \ -o program program.cu # Verbose compilation nvcc -v --ptxas-options=-v -o program program.cu ``` ### 2. Kernel Code Generation Ge...

Details

Author
a5c-ai
Repository
a5c-ai/babysitter
Created
4 months ago
Last Updated
today
Language
JavaScript
License
MIT

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

cuda-debugging

Expert skill for GPU debugging using CUDA-GDB and NVIDIA Compute Sanitizer. Detect memory errors, race conditions, uninitialized memory access, validate atomic operations, analyze kernel synchronization issues, and generate debugging reports with recommendations.

1,160 Updated today
a5c-ai
AI & Automation Solid

nsight-profiler

Expert skill for NVIDIA Nsight Systems and Nsight Compute profiling tools. Configure profiling sessions, analyze kernel reports, interpret occupancy metrics, roofline model data, memory bandwidth bottlenecks, and warp execution efficiency.

1,160 Updated today
a5c-ai
AI & Automation Solid

gpu-benchmarking

Expert skill for automated GPU performance benchmarking and regression detection. Design micro-benchmarks, measure kernel execution time with CUDA events, calculate achieved vs theoretical performance, generate comparison reports, detect regressions in CI/CD, and profile power/thermal characteristics.

1,160 Updated today
a5c-ai
AI & Automation Solid

cuda-graphs

Expert skill for CUDA Graph capture and optimization for reduced launch overhead. Capture CUDA operations into graphs, instantiate and execute graph instances, update graph node parameters, profile graph vs stream execution, design graph-friendly kernel patterns, and optimize launch latency for inference.

1,160 Updated today
a5c-ai
AI & Automation Listed

mission-control-cuda-kernel-generation

Route CUDA kernel generation or repair through Mission Control with GPU-aware validation and infrastructure checks.

1 Updated today
MN755