← ClaudeAtlas

colab-video-pipelinelisted

Use this skill when running or maintaining the Jiang Lens Google Colab video pipeline for YouTube download, diarization, transcription, Drive sync, or Playwright-based Colab automation. Requires the project Drive folder named jianglens and never commits cookies, browser profiles, tokens, or downloaded media.
apresmoi/jianglens · ★ 7 · Data & Documents · score 68
Install: claude install-skill apresmoi/jianglens
# Colab Video Pipeline Use this skill to process Jiang Lens video sources through the Colab notebooks in `ops/notebooks/colab/`. ## Safety Boundary - Do not commit Google cookies, YouTube cookies, HuggingFace tokens, Colab browser profiles, audio, or video. - Local browser auth belongs under `ops/secrets/browser-profiles/colab/`. - YouTube `yt-dlp` cookies belong in Google Drive at `/content/drive/MyDrive/jianglens/cookies.txt`. - HuggingFace auth should use Colab userdata key `HF_TOKEN` when possible. - Stop and ask the maintainer on Google login, 2FA, CAPTCHA, account chooser ambiguity, quota exhaustion, or unexpected paid-credit prompts. ## Drive Layout The canonical Drive root is: ```text /content/drive/MyDrive/jianglens/ _colab_envs/ _hf_home/ cookies.txt youtube/ _config.json <channel-or-handle>/ _channel.json <video-id>/ audio.wav metadata.youtube.json # optional; local import can create this by video id dump.json grouped.json transcription.json ``` Local text artifact sync uses: ```bash ./ops/notebooks/colab/sync-drive.sh --dry-run ./ops/notebooks/colab/sync-drive.sh ``` ## Notebook Order 1. `YouTube_Manager.ipynb`: register channels, filter, download `audio.wav` into Drive. 2. `Pyannote_4_Pipeline-GPT-5.3.ipynb`: produce `dump.json` and `grouped.json`. 3. `Whisper_Transcription.ipynb`: produce `transcription.json`. 4. `sync-drive.sh`: copy text artifacts from Drive to `content/sources/raw