cloud-infra-datalisted

AWS/GCP/Azure data infrastructure — S3/GCS/ADLS partitioning, BigQuery slot management, Redshift spectrum, Snowflake warehouses, IAM roles for data access, cost optimization, and managed service selection. Use this skill whenever the user is deploying a pipeline to cloud, choosing between managed data services, configuring storage for a data lake, setting up IAM/permissions for pipelines, asking about BigQuery pricing, Redshift vs. BigQuery vs. Snowflake, S3 bucket layout, or cloud-specific performance tuning. Also trigger when the user mentions cloud costs, slow BigQuery queries, Redshift concurrency scaling, storage formats in the cloud, or cross-account data access. If it touches cloud + data together, this skill should be active.
Methasit-Pun/data_engineer_claude_skills · ★ 1 · DevOps & Infrastructure · score 62

Install: claude install-skill Methasit-Pun/data_engineer_claude_skills

# Cloud Infrastructure for Data Pipelines ## Service Selection Guide ### Compute (query engines) | Service | Best fit | Cost model | |---|---|---| | **BigQuery** | Variable/spiky workloads, serverless preference | Per-TB scanned (on-demand) or slot reservations | | **Snowflake** | Multi-cloud, strong SQL, virtual warehouse isolation | Per-credit (compute time) | | **Redshift** | AWS-native, predictable workloads, RA3 storage separation | Per-node/hour or serverless per-RPU | | **Databricks** | Spark workloads, ML/data science teams | DBU per hour | | **Athena** | Ad-hoc queries on S3, minimal ops | Per-TB scanned | The biggest practical difference: BigQuery and Athena are serverless (no cluster to manage); Snowflake and Redshift require you to think about concurrency and warehouse sizing. ### Storage | Service | Use for | |---|---| | **S3 (AWS)** | Data lake, staging area, Parquet/Delta/Iceberg tables | | **GCS (GCP)** | Same as S3 in the GCP ecosystem | | **ADLS Gen2 (Azure)** | Azure data lake, hierarchical namespace for Hadoop compatibility | All three are object stores — they look like key-value stores, not filesystems. The "folder" structure in the key name is just a naming convention. --- ## Storage Layout and Partitioning ### S3 / GCS bucket layout ``` s3://my-data-lake/ raw/ source=salesforce/ year=2024/month=01/day=15/ events_20240115_001.parquet processed/ domain=churn/ year=2024/month=01/ churn_features_20240101.pa