Captions and subtitles look similar on screen, which is exactly why they get confused in procurement documents, compliance plans, and production budgets. They serve different audiences, follow different conventions, and in some jurisdictions carry different legal obligations. Ordering subtitles when your obligation is captioning can leave deaf viewers without sound cues and leave your organization exposed; ordering captions when you needed translated subtitles leaves international audiences with text they cannot read.
This guide separates closed captions, subtitles, and SDH, explains where FCC and ADA requirements come into play for organizations distributing video in the United States, and sets out the timing and quality standards that distinguish professional output from automated text. We produce all three formats at Emayyam for broadcasters, e-learning providers, and publishers, and the differences below are the ones that actually change how a file is specified, produced, and checked.
Closed Captions: Access to the Full Soundtrack
Closed captions exist for viewers who cannot hear the audio, so they must represent the entire soundtrack, not just the dialogue. That means speaker identification when it is not visually obvious, non-speech information such as door slams or phone ringing, and music description including lyrics where rights allow. The word closed means the viewer can turn them on or off, as opposed to open captions, which are burned permanently into the picture.
Captions are written in the same language as the audio and assume the viewer is reading instead of listening. In broadcast workflows they travel in dedicated technical formats such as the CEA-608 and CEA-708 standards used in North American television, while web video typically uses sidecar files such as WebVTT or SRT. The same transcript can feed all of these containers, but positioning, line breaks, and sound descriptions still need human judgement to render faithfully in each one.
Subtitles: Language Access for Hearing Viewers
Subtitles assume the viewer can hear the soundtrack but cannot understand the language being spoken. They translate dialogue and essential on-screen text, and they deliberately omit sound effects and speaker labels because the hearing viewer already perceives those through the audio. Good subtitling is therefore a translation discipline as much as a timing discipline: lines are condensed to preserve reading speed, idioms are adapted, and the translator constantly trades completeness against the seconds available on screen.
Because subtitles compress speech, a literal transcript translated word for word almost always fails as a subtitle file. In our localization work, the subtitle script is usually shortened noticeably compared with full dialogue, and that editorial judgement, deciding what the viewer can afford to lose, is where experienced subtitlers earn their fee. Quality review should always check the subtitles against the video and the audio together, never against the script alone.
SDH: Subtitles for the Deaf and Hard of Hearing
SDH stands for subtitles for the deaf and hard of hearing, and it deliberately blends the two formats above. Like captions, SDH includes speaker identification and non-speech audio cues; like subtitles, it is delivered as a subtitle track and may be in the original language or translated. SDH emerged partly because some distribution formats and platforms handle subtitle tracks but not broadcast-style closed caption data, so the accessibility information rides inside the subtitle file instead.
For a global streaming release, a typical deliverable set includes an original-language SDH track for accessibility plus translated subtitle tracks for each market, and sometimes translated SDH as well. When clients ask us which one they need, the deciding questions are always the same: can your audience hear the audio, can they understand the language, and what does your distribution platform technically support? The answers map cleanly onto captions, subtitles, or SDH.
Compliance: FCC, ADA, and Beyond
In the United States, the FCC requires captioning for most television programming and, under the CVAA, for online video that previously aired on television with captions. The FCC also articulates quality expectations built around four factors: accuracy, synchronicity, completeness, and placement. The ADA operates differently: it is a broad accessibility law under which organizations serving the public, including streaming services, universities, and businesses, have faced complaints and litigation over uncaptioned video. Section 508 adds obligations for US federal agencies and many of their suppliers, and WCAG guidelines call for captions on prerecorded multimedia.
If you distribute video professionally, the safe operating assumption is that accessibility captioning is required somewhere in your audience or supply chain, whether by regulation, by contract, or by platform policy. Education and government work make this explicit in procurement language, and we increasingly see commercial buyers writing WCAG conformance into vendor agreements as standard practice. Treating captioning as a default production deliverable, rather than a request-driven extra, is almost always cheaper than retrofitting it under deadline.
- FCC: captioning for broadcast TV and previously aired online video
- CVAA: extends captioning duties to internet-delivered TV content
- ADA: general accessibility law frequently applied to video services
- Section 508: applies to US federal agencies and suppliers
- WCAG: captions for prerecorded audio-visual content on the web
Timing, Reading Speed, and Quality Standards
Professional caption and subtitle work is governed by reading speed and rhythm. Viewers need enough time to read each event, so practitioners control characters per second, keep lines to a comfortable length over a maximum of two lines in most house styles, and enforce minimum and maximum durations so text neither flashes past nor lingers awkwardly. Events should respect shot changes where possible, line breaks should follow sense units rather than splitting phrases mid-thought, and timing should land with the audio rather than drifting ahead of or behind it.
These constraints are where automated output most visibly falls short. Speech recognition can produce a usable transcript, but it does not condense dialogue for reading speed, place sound cues, identify speakers reliably, or break lines for sense. Every file we deliver passes a human review against picture and sound, with checks for accuracy, timing, positioning against on-screen text, and consistency of style decisions across the whole programme.
- Control reading speed in characters per second for the target audience
- Maximum two lines per event in most broadcast styles
- Break lines at natural grammatical boundaries
- Respect shot changes and audio timing
- Keep style decisions consistent across the full programme
Building a Reliable Workflow
A dependable pipeline runs in stages: accurate transcription or script conformance first, then time-coding, then captioning or subtitle adaptation by trained editors, then a quality pass against the finished video, and finally technical validation of the delivery format for each platform. Style guides matter as much here as in any localization work: decisions about censored language, speaker labels, music notation, and numerals should be written down once and applied everywhere, not reinvented per file.
The practical takeaway: specify the audience before you specify the file. If viewers cannot hear, you need captions or SDH with sound cues and speaker identification; if they cannot understand the language, you need translated subtitles; if both, you need SDH or translated captions. Confirm your compliance context, fix your reading-speed and style rules in writing, and insist on human review against picture. Doing that consistently costs little compared with re-doing files after a platform rejection or an accessibility complaint.