cublas-cudnn
SolidExpert integration with NVIDIA GPU-accelerated math libraries. Configure cuBLAS tensor core operations, generate cuBLAS GEMM calls, integrate cuDNN layers, handle algorithm selection, and support mixed-precision operations.
Install
Quality Score: 96/100
Skill Content
Details
- Author
- a5c-ai
- Repository
- a5c-ai/babysitter
- Created
- 4 months ago
- Last Updated
- today
- Language
- JavaScript
- License
- MIT
Similar Skills
Semantically similar based on skill content — not just same category
cutlass-triton
High-performance kernel template libraries and DSLs. Generate CUTLASS GEMM configurations, implement Triton kernel definitions, configure epilogue operations, tune tile sizes and warp arrangements, and benchmark against cuBLAS.
cuda-toolkit
Deep integration with NVIDIA CUDA toolkit for kernel development, compilation, and debugging. Execute nvcc compilation with optimization flags analysis, generate and validate CUDA kernel code, analyze PTX/SASS assembly output, and configure execution parameters.
unified-memory
Expert skill for CUDA Unified Memory and memory prefetching optimization. Configure managed memory allocations, implement memory prefetch strategies, handle page fault analysis, configure memory hints and advise, profile unified memory migration, optimize for oversubscription scenarios, and compare managed vs explicit memory.
numerical-linear-algebra-toolkit
High-performance numerical linear algebra operations
nccl-communication
NVIDIA Collective Communications Library integration for multi-GPU operations. Initialize NCCL communicators, execute collective operations, configure communication topologies, profile collective performance, and support RCCL for AMD compatibility.