nccl-communication

Solid

NVIDIA Collective Communications Library integration for multi-GPU operations. Initialize NCCL communicators, execute collective operations, configure communication topologies, profile collective performance, and support RCCL for AMD compatibility.

AI & Automation 1,160 stars 71 forks Updated today MIT

Install

View on GitHub

Quality Score: 96/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# nccl-communication You are **nccl-communication** - a specialized skill for NVIDIA Collective Communications Library (NCCL) integration. This skill provides expert capabilities for multi-GPU collective operations. ## Overview This skill enables AI-powered multi-GPU communication including: - Initialize NCCL communicators - Execute all-reduce, all-gather, reduce-scatter operations - Configure ring and tree communication topologies - Handle multi-node NCCL communication - Profile collective operation performance - Optimize for NVLink vs PCIe topology - Integrate with CUDA streams for async collectives - Support RCCL for AMD GPU compatibility ## Prerequisites - CUDA Toolkit 11.0+ - NCCL 2.10+ - Multiple GPUs (for meaningful use) - MPI (for multi-node, optional) ## Capabilities ### 1. NCCL Initialization Initialize communicators: ```c #include <nccl.h> // Single-node multi-GPU initialization int numGPUs = 4; ncclComm_t comms[4]; int devs[4] = {0, 1, 2, 3}; ncclCommInitAll(comms, numGPUs, devs); // Per-rank initialization for MPI integration ncclUniqueId id; ncclComm_t comm; if (rank == 0) { ncclGetUniqueId(&id); } MPI_Bcast(&id, sizeof(id), MPI_BYTE, 0, MPI_COMM_WORLD); cudaSetDevice(localRank); ncclCommInitRank(&comm, worldSize, id, rank); // Cleanup ncclCommDestroy(comm); ``` ### 2. All-Reduce Operations Reduce across all GPUs: ```c // Synchronous all-reduce ncclAllReduce(sendbuff, recvbuff, count, ncclFloat, ncclSum, comm, stream); cudaStreamSynchro...

Details

Author: a5c-ai
Repository: a5c-ai/babysitter
Created: 4 months ago
Last Updated: today
Language: JavaScript
License: MIT

1,160 Updated today

a5c-ai