Captions That Convert: Designing Subtitles People Actually Read

Captions That Convert: Designing Subtitles People Actually Read

https://images.unsplash.com/photo-1535016120720-40c646be5580?w=800&q=80

4 min read

Man talking to a smartphone camera on a tripod.
In 2016, Facebook published data showing that 85 percent of video on their platform was being watched without sound. Eight years later, that number has not declined — it has grown. On Instagram, TikTok, and LinkedIn, silent viewing is now the default behaviour. Captions are no longer an accessibility feature. They are a primary communication channel, and most brands are still designing them as an afterthought.
Bad captions are small, grey, positioned in the lower third of the frame, and display complete sentences that require the viewer to pause their scrolling to read. They are designed for compliance, not for engagement. Good captions solve a completely different problem: how do you make information readable at scroll speed, on a moving screen, held at arm's length, by a viewer who has not yet committed to watching?
Three to five words per line is the practical maximum for a caption that can be processed in a single eye movement. Centre of frame placement keeps the text visible regardless of which platform's interface elements are overlaid at the bottom of the screen. Font sizes of at least 55 to 65 points on vertical format video ensure readability on a phone at normal viewing distance.
Good captions are not a transcription of speech — they are an edited, improved version of speech optimised for reading. Verbal fillers vanish. Long sentences break at natural pause points rather than at arbitrary character limits. The caption should be the most concise, readable version of what was said, not a word-for-word record of how it was said.
Animated captions, where individual words appear in sequence as they are spoken, consistently outperform static captions for watch time across platform studies. The animation creates visual rhythm that pulls the eye and signals ongoing activity. The key is restraint — a simple word-by-word fade-in outperforms elaborate animations because complexity draws the viewer's attention to the captions themselves rather than the content they are delivering.
Different platforms require different caption approaches because of fundamentally different viewing contexts. TikTok and Reels reward bold, large, centred, animated captions. LinkedIn viewers prefer cleaner typography at a more restrained scale. Twitter and X viewers watch almost exclusively on mute, which means every single caption word must carry full weight — no filler, no padding, no wasted space at all.

splicify

Address

234 Market Street



San Francisco, CA 94103 United States

234 Market Street



San Francisco, CA 94103 United States

Create a free website with Framer, the website builder loved by startups, designers and agencies.