Not all captions are the same — and the difference matters more than most people realize. Standard subtitles transcribe dialogue. SDH tells you a door slammed. CART provides human-generated real-time captions accurate enough for legal proceedings. Auto-captions are fast but imperfect. Knowing which type you're working with — and which is available on which platform — helps you set realistic expectations and find the right tool for each situation.
The Main Types of Captioning
Subtitles vs SDH (Subtitles for the Deaf and Hard of Hearing)
Standard subtitles transcribe spoken dialogue only — they assume the viewer can hear everything else. SDH goes further: it identifies speakers, describes non-speech audio (music, sound effects, ambient sounds), and conveys audio information that would otherwise be inaccessible. Examples of SDH descriptions include [wind howling], [phone ringing in distance], [crowd cheering], [tense music], [door slams].
For someone with significant hearing loss, the difference between subtitles and SDH is the difference between following the plot and understanding the mood, tension, and context of a scene. A character's reaction to a sound you didn't know happened only makes sense if the caption told you the sound occurred.
SDH is the standard for physical media (Blu-ray, DVD) in the US and is increasingly available on streaming platforms. Look for "SDH" or the CC symbol specifically — not just "subtitles" — when selecting caption tracks.
Automatic Speech Recognition (ASR) Captions
ASR captions are generated in real time by speech recognition software — the same technology behind Siri, Google Assistant, and voice-to-text. They appear with a short delay (typically 1-3 seconds) and accuracy varies significantly based on speaker clarity, accent, background noise, and technical vocabulary.
Major platforms have built ASR captioning into their core products. Microsoft Teams, Zoom, Google Meet, and Apple's FaceTime all offer live auto-captions. Accuracy on clear speech in quiet environments is generally good. Accuracy drops with multiple simultaneous speakers, strong accents, technical terms, or background noise — exactly the conditions common in real meetings.
ASR captions do not describe non-speech audio — they capture words only. [Laughter] or [applause] may occasionally appear in some implementations, but environmental sounds are generally not described.
CART (Communication Access Realtime Translation)
CART is human-generated real-time captioning provided by a trained stenographer. A CART provider listens to speech and transcribes it using a stenotype machine at speeds that keep pace with natural conversation — typically 95%+ accuracy even with technical content, accents, and multiple speakers.
CART is the gold standard for accessibility in educational, legal, medical, and professional settings. It's the system used in courtrooms, at conferences, and in university lecture halls. It can describe non-speech audio when the provider chooses to do so.
The practical limitation is cost and availability — CART requires a trained human provider and is priced accordingly. Remote CART (where the provider works off-site via audio feed) has made it more accessible, but it remains a premium service relative to ASR alternatives.
Closed Captions (CC) — Television Standard
Closed Captions on broadcast television in the US are federally mandated under the FCC's closed captioning rules. All broadcast and cable programming must be captioned. The CC standard includes speaker identification and some non-speech audio description, though quality varies significantly between providers.
Live broadcast captions (news, sports, live events) are typically generated by stenographers or voice writers — humans who re-speak content into speech recognition software trained on their voice. Pre-recorded broadcast content is captioned in post-production and is generally more accurate.
Live Captioning Apps & Tools by Platform
The practical question for most hearing-impaired users isn't which captioning type is best in theory — it's what's available on the device in front of them. Here's what exists across major consumer platforms as of 2026.
| Platform | Built-In Tool | Where to Find It | Notes |
|---|---|---|---|
| Windows 11 | Live Captions | Settings → Accessibility → Captions | Captions any audio on the device. Works offline. Reasonable accuracy on clear speech. |
| macOS (Ventura+) | Live Captions | System Settings → Accessibility → Live Captions | Similar to Windows implementation. Captions FaceTime calls and device audio. |
| iPhone / iPad (iOS 16+) | Live Captions | Settings → Accessibility → Live Captions | Captions phone calls, FaceTime, and media. Also available in Control Center. |
| Android | Live Transcribe | Accessibility settings or Google Play | Transcribes speech around you in real time. Requires internet connection. |
| Android | Sound Amplifier | Accessibility settings | Not captioning — amplifies and filters audio. Useful complement to captions. |
| Microsoft Teams | Live Captions + Transcript | Meeting controls → More → Turn on live captions | Captions during meeting; full transcript available after. Speaker identification included. |
| Zoom | Live Transcription | Meeting controls → CC → Enable Auto-Transcription | Must be enabled by host. Third-party CART integration also supported. |
| Google Meet | Captions | Bottom bar → Turn on captions (CC icon) | Available to all participants. English-primary; other languages expanding. |
Third-Party Apps Worth Knowing
Otter.ai
AI-powered transcription that works in real time and produces a searchable, shareable transcript. Useful for meetings, interviews, and lectures. Free tier has monthly minute limits; paid tiers remove them. Better than most built-in tools for technical vocabulary if you train it on your domain terminology.
Google Live Transcribe (Android)
Standalone app separate from the built-in Android feature. Designed specifically for face-to-face conversations — holds the phone between you and the other person and transcribes speech in real time. Useful in restaurants, appointments, and anywhere you'd normally struggle to hear someone across a table.
Apple Live Listen
Not captioning, but functionally related — uses AirPods as a remote microphone, streaming audio from your phone's mic directly to your ears. Point the phone toward a speaker across the room and hear them through your AirPods. Practical for lectures, presentations, and noisy environments. Found under Settings → Accessibility → Hearing Devices.
Sorenson Communications / ZVRS
Video Relay Services (VRS) for ASL users — connects a hearing-impaired caller with an ASL interpreter who voices the call to the hearing party. Free to qualified users under FCC regulations. Different from captioning but worth knowing for those who use ASL.
A practical note on accuracy: no auto-caption system performs well on proper nouns, technical terms, or names it hasn't seen before. If you're in a specialized field — medical, legal, technical — consider supplementing auto-captions with a custom vocabulary list where the platform allows it, or requesting CART for high-stakes situations where accuracy matters.
Choosing the Right Tool for Each Situation
- Watching TV or streaming content — look for SDH specifically, not just subtitles. CC on broadcast is federally mandated.
- Virtual meetings — Teams and Zoom have the most capable built-in tools. Enable transcription in addition to live captions where available — the transcript is searchable after the fact.
- Face-to-face conversations — Google Live Transcribe (Android) or Live Captions (iOS) with the phone placed between speakers.
- Lectures or presentations — request CART through your institution or employer's accessibility office if ASR accuracy isn't sufficient for the content.
- Phone calls — captioned telephone services (CapTel, CaptionCall, InnoCaption) display real-time captions of the other party's speech. Free to qualifying individuals under FCC relay service rules.
- High-stakes situations — legal, medical, academic — request CART. ASR accuracy is not reliable enough for content where every word matters.