Interpreting WCAG 2.0
Does a 'talking head' video need captions?


When videos of ‘talking heads’ (like news reporters or politicians) are posted on websites conforming to WCAG 2.0, do they always require captions?

According to WCAG 2.0 success criterion 1.2.2 Captions (Prerecorded), all videos on the web should have extended captions – i.e., captions that "not only include dialogue, but identify who is speaking and include non-speech information conveyed through sound, including meaningful sound effects" (Understanding SC 1.2.2). Even live video streams should have captions—a requirement not easily met, as the WCAG acknowledge.

A loophole for information with alternate presentation?

Looking more closely at "Understanding SC 1.2.2" it appears that the WCAG offer an interesting loophole:

Captions are not needed when the synchronized media is, itself, an alternate presentation of information that is also presented via text on the Web page. For example, if information on a page is accompanied by a synchronized media presentation that presents no more information than is already presented in text, but is easier for people with cognitive, language, or learning disabilities to understand, then it would not need to be captioned since the information is already presented on the page in text or in text alternatives (e.g., for images).

WCAG 2.0, Understanding SC 1.2.2

Script and transcript

The examples given indicate that the loophole applies to video that was created as a secondary resource, for example, to communicate instructional text to people with reading difficulties. Does that mean that in order for the video to qualify as alternate presentation that can be exempted from captioning, the primary resource has to be a script? If so, would the loophole be applicable to a broadcaster reading the news off a script or prompt, or even a speech that has been scripted first and is then recorded as the politician holds it?

Often, it is not straightforward to tell what came first. What if a politician changes his or her speech slightly and the script is modified, becoming partly a transcript? A transcript, as the name suggests, would always be a secondary presentation.

For people who are deaf or hard of hearing, this kind of hairsplitting is largely irrelevant. For them, the real question is whether the alternate presentation contains as much information as the video, whether the claim of equivalence is justified.

What makes an alternative presentation equivalent?

In many cases where the video image presents action and several speakers, like in movies and television shows, it is clear that a script or transcript can never be a full equivalent. Here, captioning (and also an audio description of relevant events in the image) are clearly required. The question remains whether equivalence can be claimed in the case of 'talking heads': a speaker is presented in a fixed frame with unchanging background. Is it true that such a video "presents no more information than is already presented in text"?

The amount of information presented in the image is always a matter of degree and not just down to the technicalities of fixed frame or unchanging background. Watching a news broadcaster routinely reading the news is different from, say, watching a politician confronted with claims that he did not tell the truth. Here, nonverbal signals such as sideward glances or fidgeting will become important qualifiers of the 'message'.

The advantages of captioned video—and of alternate presentations

Even in cases of a precise equivalence between a spoken message and script, a captioned video carries clear advantages over alternate presentations: Gestures and facial expression are synchronized with the text and usually carry meaning. Therefore, exemptions from captioning should be reserved to cases where the video is intended to compensate for reading difficulties as in the examples given in 'Understanding SC 1.2.2'. This is not the case in most of the common 'talking head' scenarios.

As an aside, we should acknowledge that having an alternate textual presentation also has its advantages: We can read it at our own pace, copy it, or print it. In the best of worlds, we would have both captions and a script or transcript.

Compliance in WCAG and the rating scheme in BITV-Test

The success criteria of WCAG 2.0 presumably know of only two states: pass or fail. In many cases, however, things are not black and while. Just take SC 1.2.2:

  • Case 1: A video without captions and text alternative: a clear "fail"
  • Case 2: A video without captions and an somewhat incomplete alternate description: a "fail", too, but it is clearly better than case 1
  • Case 3: A video with Captions that leave out some information: still a "fail"? Or a "pass" you are not quite happy with?
  • Case 4: A talking head video without captions, but marked as alternative for an equivalent text description: a "pass", using the loophole
  • Case 5: A talking head video with captions: a "pass", without loophole
  • Case 5: A talking head video with captions and a text description: also a "pass" but clearly better than case 4 and 5

BITV-Test, a German web-based accessibility evaluation tool which in its latest revision is a WCAG 2.0 test in all but name, takes a different approach. Its checkpoints that have graded rating scheme that offers five grades: from a clear "pass" (100%) via intermediate steps to a clear "fail" (0%).

In the BITV-Test scheme, a “talking head” video which is not captioned but has a good alternate textual presentation would at best get grade 4 (or 75% pass); any deficiencies in captioning or lack of equivalence of alternate descriptions can therefore be reflected in the given rating.


To recap the headline question and answer it: Yes, even a talking head video should have captions. If a video just offers an alternate presentation of a document to cater for people with reading difficulties, success criterion 1.2.2 would also be met. Whether we should still assign a "pass" to a 'neutral' news broadcaster video with equivalent script is already debatable. It depends on the chosen evaluation methodology whether the many shades of grey that exist in meeting WCAG success criteria such as 1.2.2 can be properly reflected.