Transcribe Media
Create a research-grade transcript sidecar for a local acquired audio or video file. This primitive supports media-curator to research handoff. It does not claim transcription support unless an actual local STT tool, approved service adapter, human transcript, or diarization sidecar is available.
Inputs
Required:
- Local acquired media path.
Optional:
- Source URL, title, creator, acquired-at timestamp, acquisition ID, language.
- Existing transcript or diarization sidecar.
Output
Write transcript sidecars under .aiwg/media/transcripts/ or beside the acquired media when the collection already stores sidecars locally.
Recommended filename: <media-basename>.transcript.json
Required fields:
schema:aiwg.media.transcript.v1source.path,source.url,source.sha256transcript.sha256,transcript.language,transcript.generated_at,transcript.tool,transcript.qualitysegments[]with stableid,start,end,text, and optionalspeakerprovenance.wasDerivedFrom,provenance.generatedEntity,provenance.activity,provenance.used
Segment IDs MUST be stable. Use zero-padded sequential IDs such as seg-000001 unless the upstream transcript already has durable IDs.
Hashing
source.sha256is the SHA-256 of the exact local media file bytes.transcript.sha256is the SHA-256 of the canonical transcript payload used for citation, not the pretty-printed JSON file.- The canonical payload is the UTF-8 join of
id,start,end,speakerif present, andtextfor every segment, separated by tabs and newlines. - Use the same lowercase
sha256:<hex>convention as media-curator integrity manifests.
Speaker Labels
Preserve speaker labels when STT output, a diarization sidecar, or a human transcript provides them. If no diarization is available, emit the documented single-speaker fallback SPEAKER_00 and record the limitation in transcript.quality.limitations.
Do not invent speaker names. Replace SPEAKER_00 with real names only when metadata or human verification proves them.
Tooling Detection
Check for an available transcription path before generating text:
command -v whisper-cpp || command -v whisper || command -v vosk-transcriber || true
command -v ffmpeg || true
If no STT tool or approved transcript source is available, do not fabricate transcript text. Write or report an actionable plan with:
schema:aiwg.media.transcript-plan.v1status:blocked-tooling-missing- source path and source hash when the media file can be read
- next steps for installing local STT tooling or providing a human transcript
- quality limits stating that no transcript hash exists until segment text exists
Verification Limits
A generated transcript is evidence of tool output, not proof of exact speech content. Handoff notes MUST state:
- Machine transcripts can contain word errors, omissions, and hallucinated punctuation.
- Speaker labels are provisional unless diarization or human review supports them.
- Research induction should cite the transcript hash and source media hash together.
- Human verification is required before using quotations in high-stakes or published claims.
Research Handoff
Include the transcript sidecar path, source media hash, transcript hash, source URL, acquisition metadata, quality status, and known limitations.
Fixture Example
See examples/sample.transcript.json for a minimal transcript sidecar with timestamps, speaker fallback, source URL, source hash, transcript hash, and provenance fields.
References
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/integrity-verification/SKILL.md — SHA-256 manifest and fixity conventions
- @$AIWG_ROOT/agentic/code/frameworks/media-curator/skills/provenance-tracking/SKILL.md — W3C PROV-O derivation model for media artifacts
- @$AIWG_ROOT/docs/integrations/media-curator-to-research-handoff.md — Research handoff expectations for media-derived artifacts