← ClaudeAtlas

extracting-keywordslisted

Extract keywords from documents using YAKE algorithm with support for 34 languages (Arabic to Chinese). Use when users request keyword extraction, key terms, topic identification, content summarization, or document analysis. Includes domain-specific stopwords for AI/ML and life sciences. Optional deeper extraction mode (n=2+n=3 combined) for comprehensive coverage.
oaustegard/claude-skills · ★ 124 · Data & Documents · score 84
Install: claude install-skill oaustegard/claude-skills
# Extracting Keywords Extract keywords from text using YAKE (Yet Another Keyword Extractor), an unsupervised statistical keyword extraction algorithm. ## Installation **First time only:** Install YAKE with optimized dependencies to avoid unnecessary downloads. ```bash cd /home/claude uv venv yake-venv --system-site-packages uv pip install yake --python yake-venv/bin/python --no-deps uv pip install jellyfish segtok regex --python yake-venv/bin/python ``` This reuses system packages (numpy, networkx) instead of downloading them (~0.08s vs ~5s). ## Stopwords Configuration **Built-in YAKE stopwords (34 languages):** Use `lan="<code>"` parameter - See Parameters section below for all 34 supported language codes - English (`lan="en"`) is the default **Custom domain stopwords (bundled in `assets/`):** **AI/ML:** `stopwords_ai.txt` - English stopwords + 783 AI/ML domain-specific terms (1357 total) - Filters AI/ML methodology noise (model, training, network, algorithm, parameter) - Filters ML boilerplate (dataset, baseline, benchmark, experiment, evaluation) - Filters technical terms (transformer, embedding, attention, optimization, inference) - Includes full lemmatization (train/trains/trained/training/trainer) - Use for AI/ML papers, technical reports, machine learning literature - **Performance impact:** +4-5% runtime vs English stopwords **Life Sciences:** `stopwords_ls.txt` - English stopwords + 719 life sciences domain-specific terms (1293 total) - Filters research met