Not all captions are the same — and the difference matters more than most people realize. Standard subtitles transcribe dialogue. SDH tells you a door slammed. CART provides human-generated real-time captions accurate enough for legal proceedings. Auto-captions are fast but imperfect. Knowing which type you're working with — and which is available on which platform — helps you set realistic expectations and find the right tool for each situation.

The Main Types of Captioning

PRE-RECORDED

Subtitles vs SDH (Subtitles for the Deaf and Hard of Hearing)

Standard subtitles transcribe spoken dialogue only — they assume the viewer can hear everything else. SDH goes further: it identifies speakers, describes non-speech audio (music, sound effects, ambient sounds), and conveys audio information that would otherwise be inaccessible. Examples of SDH descriptions include [wind howling], [phone ringing in distance], [crowd cheering], [tense music], [door slams].

For someone with significant hearing loss, the difference between subtitles and SDH is the difference between following the plot and understanding the mood, tension, and context of a scene. A character's reaction to a sound you didn't know happened only makes sense if the caption told you the sound occurred.

SDH is the standard for physical media (Blu-ray, DVD) in the US and is increasingly available on streaming platforms. Look for "SDH" or the CC symbol specifically — not just "subtitles" — when selecting caption tracks.

Best forPre-recorded content — streaming, physical media, downloaded video
LimitationOnly available on content that has been professionally captioned — live content uses different systems
LIVE / AUTO-GENERATED

Automatic Speech Recognition (ASR) Captions

ASR captions are generated in real time by speech recognition software — the same technology behind Siri, Google Assistant, and voice-to-text. They appear with a short delay (typically 1-3 seconds) and accuracy varies significantly based on speaker clarity, accent, background noise, and technical vocabulary.

Major platforms have built ASR captioning into their core products. Microsoft Teams, Zoom, Google Meet, and Apple's FaceTime all offer live auto-captions. Accuracy on clear speech in quiet environments is generally good. Accuracy drops with multiple simultaneous speakers, strong accents, technical terms, or background noise — exactly the conditions common in real meetings.

ASR captions do not describe non-speech audio — they capture words only. [Laughter] or [applause] may occasionally appear in some implementations, but environmental sounds are generally not described.

Best forLive meetings, calls, real-time conversations where some errors are acceptable
LimitationAccuracy varies; no sound description; struggles with accents and technical terms
PROFESSIONAL LIVE

CART (Communication Access Realtime Translation)

CART is human-generated real-time captioning provided by a trained stenographer. A CART provider listens to speech and transcribes it using a stenotype machine at speeds that keep pace with natural conversation — typically 95%+ accuracy even with technical content, accents, and multiple speakers.

CART is the gold standard for accessibility in educational, legal, medical, and professional settings. It's the system used in courtrooms, at conferences, and in university lecture halls. It can describe non-speech audio when the provider chooses to do so.

The practical limitation is cost and availability — CART requires a trained human provider and is priced accordingly. Remote CART (where the provider works off-site via audio feed) has made it more accessible, but it remains a premium service relative to ASR alternatives.

Best forHigh-stakes situations — legal, medical, academic, professional conferences
LimitationCost; requires advance scheduling; not practical for casual daily use
BROADCAST

Closed Captions (CC) — Television Standard

Closed Captions on broadcast television in the US are federally mandated under the FCC's closed captioning rules. All broadcast and cable programming must be captioned. The CC standard includes speaker identification and some non-speech audio description, though quality varies significantly between providers.

Live broadcast captions (news, sports, live events) are typically generated by stenographers or voice writers — humans who re-speak content into speech recognition software trained on their voice. Pre-recorded broadcast content is captioned in post-production and is generally more accurate.

Best forTelevision — live and recorded broadcast content
LimitationLive broadcast quality varies; streaming services operate under different rules than broadcast

Live Captioning Apps & Tools by Platform

The practical question for most hearing-impaired users isn't which captioning type is best in theory — it's what's available on the device in front of them. Here's what exists across major consumer platforms as of 2026.

Platform Built-In Tool Where to Find It Notes
Windows 11 Live Captions Settings → Accessibility → Captions Captions any audio on the device. Works offline. Reasonable accuracy on clear speech.
macOS (Ventura+) Live Captions System Settings → Accessibility → Live Captions Similar to Windows implementation. Captions FaceTime calls and device audio.
iPhone / iPad (iOS 16+) Live Captions Settings → Accessibility → Live Captions Captions phone calls, FaceTime, and media. Also available in Control Center.
Android Live Transcribe Accessibility settings or Google Play Transcribes speech around you in real time. Requires internet connection.
Android Sound Amplifier Accessibility settings Not captioning — amplifies and filters audio. Useful complement to captions.
Microsoft Teams Live Captions + Transcript Meeting controls → More → Turn on live captions Captions during meeting; full transcript available after. Speaker identification included.
Zoom Live Transcription Meeting controls → CC → Enable Auto-Transcription Must be enabled by host. Third-party CART integration also supported.
Google Meet Captions Bottom bar → Turn on captions (CC icon) Available to all participants. English-primary; other languages expanding.

Third-Party Apps Worth Knowing

Otter.ai

AI-powered transcription that works in real time and produces a searchable, shareable transcript. Useful for meetings, interviews, and lectures. Free tier has monthly minute limits; paid tiers remove them. Better than most built-in tools for technical vocabulary if you train it on your domain terminology.

Google Live Transcribe (Android)

Standalone app separate from the built-in Android feature. Designed specifically for face-to-face conversations — holds the phone between you and the other person and transcribes speech in real time. Useful in restaurants, appointments, and anywhere you'd normally struggle to hear someone across a table.

Apple Live Listen

Not captioning, but functionally related — uses AirPods as a remote microphone, streaming audio from your phone's mic directly to your ears. Point the phone toward a speaker across the room and hear them through your AirPods. Practical for lectures, presentations, and noisy environments. Found under Settings → Accessibility → Hearing Devices.

Sorenson Communications / ZVRS

Video Relay Services (VRS) for ASL users — connects a hearing-impaired caller with an ASL interpreter who voices the call to the hearing party. Free to qualified users under FCC regulations. Different from captioning but worth knowing for those who use ASL.

A practical note on accuracy: no auto-caption system performs well on proper nouns, technical terms, or names it hasn't seen before. If you're in a specialized field — medical, legal, technical — consider supplementing auto-captions with a custom vocabulary list where the platform allows it, or requesting CART for high-stakes situations where accuracy matters.

Choosing the Right Tool for Each Situation