dask

Solid

Parallel/distributed computing. Scale pandas/NumPy beyond memory, parallel DataFrames/Arrays, multi-file processing, task graphs, for larger-than-RAM datasets and parallel workflows.

AI & Automation 2,210 stars 164 forks Updated 1 weeks ago Apache-2.0

Install

View on GitHub

Quality Score: 91/100

Stars 20%
100
Recency 20%
90
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Dask ## Overview Dask is a Python library for parallel and distributed computing that enables three critical capabilities: - **Larger-than-memory execution** on single machines for data exceeding available RAM - **Parallel processing** for improved computational speed across multiple cores - **Distributed computation** supporting terabyte-scale datasets across multiple machines Dask scales from laptops (processing ~100 GiB) to clusters (processing ~100 TiB) while maintaining familiar Python APIs. ## When to Use This Skill This skill should be used when: - Process datasets that exceed available RAM - Scale pandas or NumPy operations to larger datasets - Parallelize computations for performance improvements - Process multiple files efficiently (CSVs, Parquet, JSON, text logs) - Build custom parallel workflows with task dependencies - Distribute workloads across multiple cores or machines ## Core Capabilities Dask provides five main components, each suited to different use cases: ### 1. DataFrames - Parallel Pandas Operations **Purpose**: Scale pandas operations to larger datasets through parallel processing. **When to Use**: - Tabular data exceeds available RAM - Need to process multiple CSV/Parquet files together - Pandas operations are slow and need parallelization - Scaling from pandas prototype to production **Reference Documentation**: For comprehensive guidance on Dask DataFrames, refer to `references/dataframes.md` which includes: - Reading data (single file...

Details

Author
foryourhealth111-pixel
Repository
foryourhealth111-pixel/Vibe-Skills
Created
3 months ago
Last Updated
1 weeks ago
Language
Python
License
Apache-2.0

Similar Skills

Semantically similar based on skill content — not just same category

Data & Documents Solid

dask

Parallel/distributed computing. Scale pandas/NumPy beyond memory, parallel DataFrames/Arrays, multi-file processing, task graphs, for larger-than-RAM datasets and parallel workflows.

27,705 Updated today
davila7
Data & Documents Listed

dask

Parallel/distributed computing. Scale pandas/NumPy beyond memory, parallel DataFrames/Arrays, multi-file processing, task graphs, for larger-than-RAM datasets and parallel workflows.

335 Updated today
aiskillstore
AI & Automation Solid

dask

Distributed computing for larger-than-RAM pandas/NumPy workflows. Use when you need to scale existing pandas/NumPy code beyond memory or across clusters. Best for parallel file processing, distributed ML, integration with existing pandas code. For out-of-core analytics on single machine use vaex; for in-memory speed use polars.

26,817 Updated today
K-Dense-AI
AI & Automation Listed

dask

Distributed computing for larger-than-RAM pandas/NumPy workflows. Use when you need to scale existing pandas/NumPy code beyond memory or across clusters. Best for parallel file processing, distributed ML, integration with existing pandas code. For out-of-core analytics on single machine use vaex; for in-memory speed use polars.

15 Updated 2 days ago
charlieviettq
Data & Documents Listed

bigdata-processing

Core big data processing toolkit for data teams. Includes Polars, Dask, Vaex for large-scale data processing, ETL pipelines, and distributed computing. Use when working with datasets larger than memory, building data pipelines, or optimizing data processing performance.

1 Updated 2 days ago
MARUCIE