Skip to main content
AI/MLjmagly

Transcribe Media

Produce timestamped transcript sidecars for acquired audio/video with hashes, source metadata, speaker labels when available, and explicit degraded plans when STT tooling is missing

Stars
141
Source
jmagly/aiwg
Updated
2026-05-31
Slug
jmagly--aiwg--transcribe-media
View on GitHubRaw SKILL.md

// install — copy + paste into any project

mkdir -p .claude/skills && curl -fsSL https://raw.githubusercontent.com/jmagly/aiwg/HEAD/agentic/code/frameworks/media-curator/skills/transcribe-media/SKILL.md -o .claude/skills/transcribe-media.md

Drops the SKILL.md into .claude/skills/transcribe-media.md. Works with Claude Code, Cursor, and any agent that loads SKILL.md files from .claude/skills/.

Transcribe Media

Create a research-grade transcript sidecar for a local acquired audio or video file. This primitive supports media-curator to research handoff. It does not claim transcription support unless an actual local STT tool, approved service adapter, human transcript, or diarization sidecar is available.

Inputs

Required:

  • Local acquired media path.

Optional:

  • Source URL, title, creator, acquired-at timestamp, acquisition ID, language.
  • Existing transcript or diarization sidecar.

Output

Write transcript sidecars under .aiwg/media/transcripts/ or beside the acquired media when the collection already stores sidecars locally.

Recommended filename: <media-basename>.transcript.json

Required fields:

  • schema: aiwg.media.transcript.v1
  • source.path, source.url, source.sha256
  • transcript.sha256, transcript.language, transcript.generated_at, transcript.tool, transcript.quality
  • segments[] with stable id, start, end, text, and optional speaker
  • provenance.wasDerivedFrom, provenance.generatedEntity, provenance.activity, provenance.used

Segment IDs MUST be stable. Use zero-padded sequential IDs such as seg-000001 unless the upstream transcript already has durable IDs.

Hashing

  • source.sha256 is the SHA-256 of the exact local media file bytes.
  • transcript.sha256 is the SHA-256 of the canonical transcript payload used for citation, not the pretty-printed JSON file.
  • The canonical payload is the UTF-8 join of id, start, end, speaker if present, and text for every segment, separated by tabs and newlines.
  • Use the same lowercase sha256:<hex> convention as media-curator integrity manifests.

Speaker Labels

Preserve speaker labels when STT output, a diarization sidecar, or a human transcript provides them. If no diarization is available, emit the documented single-speaker fallback SPEAKER_00 and record the limitation in transcript.quality.limitations.

Do not invent speaker names. Replace SPEAKER_00 with real names only when metadata or human verification proves them.

Tooling Detection

Check for an available transcription path before generating text:

command -v whisper-cpp || command -v whisper || command -v vosk-transcriber || true
command -v ffmpeg || true

If no STT tool or approved transcript source is available, do not fabricate transcript text. Write or report an actionable plan with:

  • schema: aiwg.media.transcript-plan.v1
  • status: blocked-tooling-missing
  • source path and source hash when the media file can be read
  • next steps for installing local STT tooling or providing a human transcript
  • quality limits stating that no transcript hash exists until segment text exists

Verification Limits

A generated transcript is evidence of tool output, not proof of exact speech content. Handoff notes MUST state:

  • Machine transcripts can contain word errors, omissions, and hallucinated punctuation.
  • Speaker labels are provisional unless diarization or human review supports them.
  • Research induction should cite the transcript hash and source media hash together.
  • Human verification is required before using quotations in high-stakes or published claims.

Research Handoff

Include the transcript sidecar path, source media hash, transcript hash, source URL, acquisition metadata, quality status, and known limitations.

Fixture Example

See examples/sample.transcript.json for a minimal transcript sidecar with timestamps, speaker fallback, source URL, source hash, transcript hash, and provenance fields.

References

  • @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/integrity-verification/SKILL.md — SHA-256 manifest and fixity conventions
  • @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/provenance-tracking/SKILL.md — W3C PROV-O derivation model for media artifacts
  • @$AIWG_ROOT/docs/integrations/media-curator-to-research-handoff.md — Research handoff expectations for media-derived artifacts