ingest-youtube — YouTube-to-vault connector
Pulls YouTube transcripts into a markdown vault as queryable typed-memory entries that downstream skills (knowledge graph extraction, voice-fingerprint training, content repurposing, action-item extraction) can act on.
Same pattern as ingest-slack, ingest-whatsapp, ingest-notion, ingest-linear, ingest-github, ingest-gmail. Adding YouTube means a new normalizer, not a new architecture.
When to use
- User pastes a YouTube URL and asks for a transcript or summary
- User says
/ingest-youtube <url>for a single video - User asks to capture, sync, ingest, transcribe, or pull a talk/podcast/keynote into the vault
Do NOT use for:
- Downloading the actual video file (use
yt-dlpdirectly with-f best) - Channel-wide ingestion or
--dayswindows; this script ingests one video URL at a time - Live streams (transcripts are not stable)
- Non-YouTube sources (Vimeo, Twitch, Twitter Spaces have their own connectors)
- One-off transcript reads where the user does not want a vault file (run
yt-dlp --write-auto-subdirectly and pipe to stdout)
How it works
- Parse the input as one YouTube video URL.
- Verify
yt-dlpis installed. If not, the script exits with install instructions:brew install yt-dlp(macOS) orpip3 install --user yt-dlp. - Validate the URL as a single http(s) YouTube video and call
yt-dlp --ignore-config --list-subs -- <url>to enumerate available subtitles. - Subtitle priority: manual subs > auto-generated captions. Manual subs preserve creator-provided punctuation and speaker labels; auto-gen is uppercase + no punctuation.
- Download the highest-priority subtitle as VTT via
yt-dlp --write-sub --sub-lang <lang> --skip-download. Default language preference:en,es(English first, Spanish second). - Strip VTT timing markers and merge into clean prose paragraphs. Deduplicate repeated lines (auto-generated VTTs are line-doubled). Preserve speaker labels if the source had them.
- Pull video metadata (title, channel, upload date, duration, video_id, URL) via
yt-dlp --print-json --skip-download. - Slugify the channel name and video title. Write to
External Inputs/YouTube/<channel-slug>/<YYYY-MM-DD>-<video-slug>.md. - Scan transcript for trigger keywords (decision, framework, model, principle, "the lesson is", playbook, anti-pattern, case study). For each match, create a writing-seed stub at
Meta/Captures/<YYYY-MM-DD>-youtube-<channel-slug>-<video-id>.mdso the seed lands in the captures aggregator. - Print summary: file path, transcript word count, language, seeds detected.
Invocation
python3 ingest.py <youtube-url> [--vault <path>] [--lang <code>]
Defaults:
--vault:$VAULT_ROOTenv var or current directory--lang:en,es(English first, Spanish second; matches a common bilingual default)--whisper: accepted as a future fallback flag, but this version writes a stub when no subtitles are available
Output contract
The vault file at External Inputs/YouTube/<channel-slug>/<YYYY-MM-DD>-<video-slug>.md has frontmatter:
---
type: external-input
source: youtube
video_id: <11-char ID>
url: https://www.youtube.com/watch?v=<id>
channel: <channel-name>
channel_url: https://www.youtube.com/<handle>
title: <video title>
upload_date: <YYYY-MM-DD>
duration_seconds: <int>
language: <ISO code>
subtitle_source: manual | auto | whisper
word_count: <int>
ingested_at: <ISO 8601 timestamp>
---
Body is the cleaned transcript as paragraph prose. If the source had speaker labels, format as **<speaker>:** <text> per turn.
Idempotency
Re-ingesting the same video URL overwrites the same vault file. The seed stub filenames hash the video_id, so the same source video produces the same stub filename across re-runs. Re-runs refresh, never duplicate.
Missing subtitles
If yt-dlp --list-subs returns no manual or auto subtitles, the script writes a stub vault note with the video metadata and source URL instead of failing silently. The --whisper flag is reserved for a future local transcription fallback and currently reports that the fallback is not implemented.
For a manual fallback today, download audio with yt-dlp, transcribe it with your local Whisper workflow, and add captions or transcript text before rerunning the ingest.
Limitations
- Ingests one YouTube video URL per run; channel handles, playlists, and
--dayswindows are out of scope. - Depends on subtitles returned by
yt-dlp; videos without subtitles produce a metadata stub, not a transcript. - Does not download video files or perform built-in Whisper transcription in this version.
- Network availability, YouTube subtitle access, and local
yt-dlpbehavior determine whether ingest succeeds.
Acceptance test
Run against the first YouTube video ever uploaded:
python3 ingest.py "https://www.youtube.com/watch?v=jNQXAC9IVRw" --vault /tmp/test
Expected output:
Wrote 39 words to /tmp/test/External Inputs/YouTube/jawed/2005-04-24-me-at-the-zoo.md. Language: en. Subtitle source: manual.
The output file contains valid frontmatter and a clean prose body.
Dependencies
yt-dlp(required): install viabrew install yt-dlporpip3 install --user yt-dlpwhisper-cpp(optional for a manual fallback outside this script)
Source
Bundled in adelaidasofia/ai-brain-starter, a verification harness around an AI agent so memory compounds instead of corrupts. The skill is part of the ingest-* family of vault connectors.