What It’s Like to Have Auditory Processing Disorder, As Illustrated By Auto-Generated YouTube Captions

Like a lot of people with ADHD, I also have central auditory processing disorder.

CAPD manifests as a problem understanding speech and other sounds. It isn’t a hearing problem per se: The structures of the ear work just fine to capture sound waves and transmit them as electrical impulses to the brain. The brain, however, struggles to interpret these electrical impulses effectively.

People with CAPD frequently have trouble understanding what’s being said to them, especially if the sound of the speaker is in any way distorted (phone lines, VoIP), interrupted (conversations in noisy restaurants), or intruding upon a preexisting focus (someone trying to talk to you while you’re concentrating).

Most folks with CAPD identify heavily with this exchange:

Them: Can you hand me the remote?
You: What?
Them: Can you hand m-
You: Oh, sure. *passes remote*

It’s not that we didn’t hear the first “can you hand me the remote?”, per se. It’s that our brains lag translate it into a comprehensible statement. We know we were asked something (hence “What?”), but it takes extra time for us to realize what the something was.

And we very often get it wrong. “Can you hand me the remote?” could just as easily be interpreted by our brains as “Canoe slappy boat,” or as sounds that don’t register as language at all. (“Canoe slappy boat” is very likely because our brains will try to make sense of the sound input we just got, and “canoe” and “boat” are related words.)

capd

When Captions Fail

Like a lot of people with CAPD, I watch television with the captions on. Captions help my brain keep up with what’s being said by giving me an insta-check on what I thought I heard.

Usually.

Auto-generated captions, created by algorithm, are increasingly popular – particularly on sites like YouTube, where captioning everything uploaded in one minute would take over 300 years if done by humans.

The accuracy of YouTube’s auto-generated caption algorithm appears to depend on many of the same factors that affect the accuracy of comprehension in CAPD. For instance, auto-generated captions over a single speaker enunciating clearly into a microphone in an otherwise silent space are generally accurate. Auto-generated captions over a musical track or with significant background noise are often not.

Sometimes, however, the speech seems clear but the auto-generated captions really fail. And in one particular instance, the failures looked almost exactly like what my brain “hears” through the filter of CAPD.

Strength of the Algorithm: Auto-Captioning Fails in BraveStarr

BraveStarr ran for one season, in 1987-88. The show was Filmation’s last animated series (He-Man and She-Ra having both been pulled the year before). It has a lot of the hallmarkes of a Filmation piece – especially the presence of nearly-incomprehensible characters and the use of the same five voice actors for nearly every character.

Here’s what it’s like to hear through CAPD. (The first two examples are from the episode “Unsung Hero.” The rest are from “An Older Hand.”)

Unsung Hero

screenshot_20200115-214858_youtube6836738240595484830.jpg

YouTube/My Brain: “…interested in in mining carrion no one wants to be a pot farmer imprudent….”

The Actual Line: “…interested in mining Keriam. No one wants to be a pod farmer, including [my son].”

This was the first screenshot I grabbed. At the time, I was merely amused at the “pot farmer” part of the joke.

Then things got even more inappropriate:

screenshot_20200115-215703_youtube2395065167572382185.jpg

YouTube/My Brain: “you all right I think so oh but your whoreson stolen”

The Actual Line: “You all right? I think so. Oh, but your horse was stolen!”

CAPD makes my brain translate ordinary sentences into potentially offensive ones all the time. I don’t even comment on it anymore.

(Given that the brain tries to fit sound into a pattern with which it’s already familiar, this might say more about the frequency with which I hear and use profanity than it does about CAPD.)

Typically, YouTube’s auto-caption generator doesn’t trip much on human characters’ lines. The humans characters’ voice actors tend to deliver these lines straight; they save accents, funny voices, etc. for non-human characters. The algorithms’ ability to handle non-human characters’ lines ranges from “bad” to “nonexistent.”

An Older Hand

All of these examples contain lines delivered by various Prairie People.

Normally, YouTube’s auto-generated captions don’t interpret the Prairie People’s voices as speech at all. Captions simply aren’t generated when Prairie People are speaking.

In this episode, however, the algorithm recognized when Prairie People characters were speaking most of the time. But it struggled with what they were saying – in a way very similar to my own brain’s struggles.

screenshot_20200115-213055_youtube591877493960048963.jpg

My Brain/YouTube: “bigger you might be you maybe not Maggie what’s your work being a good screen”

The Actual Line: “Whoa. Maybe not magic, but still work pretty good.” [scene change] “BraveStarr….”

Like a lot of things I hear with CAPD, this caption makes no literal sense. Those are words, but they cannot possibly be the words the speaker actually said – can they?

This caption also carries over lines from a previous character/scene into the new one, where it mashes them together with the start of a line delivered by a character in this scene. It’s not unlike having to listen to someone speak in a noisy restaurant or bar: My brain doesn’t always distinguish between “what this person is saying” and “what someone else in the room said.”

screenshot_20200115-213348_youtube968039243187263188.jpg

YouTube/My Brain: “real that really be young bad guy a riot you’ll never did stop believin”

The Actual Line: “Well, that’s where it belong, by golly wollies.” “You never did stop believing….”

I didn’t expect YouTube to get “by golly wollies” on the first try (or ever). But idiolectic details like “by golly wollies” can make the comprehension process even harder. Until I’m aware that the person will frequently do things like interject “by golly wollies” or pronounce “washing” with an “r,” my brain won’t account for them in processing – so I’ll struggle even more to understand the speaker.

screenshot_20200115-213354_youtube7713170984081040639.jpg

YouTube/My Brain: “you know you lose your hoop boys always believing you but it more potent”

The Actual Line: “No sir, Fuzz always believe in you. But it more important….”

This line almost made me wish YouTube had not started picking up on the Prairie People’s speech as speech. I’ve watched enough Fuzz episodes to understand him (on a 2-3 second delay), but the captions here actually made matters a lot worse.

This is also a good example of how non-spoken sounds will get interpreted by a CAPD brain as speech. “No sir, Fuzz” became “you know you lose your hoop boys” due to background non-speech noises in the actual scene.

It’s not that people with CAPD aren’t listening to you. It’s that what you said + all the sounds around it = “you know you lose your hoop boys.” You’d say “What?!” too.

screenshot_20200115-213146_youtube2397116761729934517.jpg

YouTube/My Brain: “karyam I’m as powerful as she wore brave stars under stick”

The Actual Line: “…Keriam. I’m as powerful as you are, BraveStarr! ThunderStick….”

YouTube’s lack of punctuation in auto-generated captions illustrates another common pitfall for those of us with CAPD: We don’t always “hear” where punctuation fits into spoken language.

For instance, this joke is typically presented in written form:

Let’s eat grandma!

Let’s eat, grandma!

Commas save lives.

When spoken, there’s typically a change in pace and pitch that indicates the relationship between “eat” and “grandma” that the comma encodes in writing. Here’s a bad attempt to draw it:

lets eat grandma

People without CAPD can hear the change in pace and pitch that indicates whether “let’s eat” is a comment made to grandma (let’s eat, grandma!) or if grandma is the object to be eaten (let’s eat grandma).

With CAPD, the brain doesn’t always process pace and pitch, either. So even if we understand the words “let’s,” “eat,” and “grandma,” we may not know whether the speaker is proposing to grandma that we eat… or proposing we eat grandma.

This auto-generated caption mistake crams together parts of three separate sentences, each of which include one name (“Keriam,” “BraveStarr,” “ThunderStick”). The combination of proper names and lack of punctuation further confuses the meaning, both in the caption and in hearing with CAPD.

What’s the Point of All This?

I started collecting auto-caption BraveStarr mishaps because they were funny. I still giggle at “no one wants to be a pot farmer.”

But they also turn out to be great examples of how my brain mishears things.

Living with CAPD can be tough, especially when you go undiagnosed for decades (as I did). With CAPD, people assuming you’re deaf or hard of hearing is the good outcome. They’re more likely to assume you’re rude or lazy, especially if they know you well enough to know you can hear.

To complicate matters, CAPD often rides along with neurodivergences that make people more sensitive to sound, like autism and ADHD. It’s not uncommon for children with CAPD to get hearing tests that report their hearing is, if anything, too good. It’s not enough to test hearing – you also need to test processing, or what happens once the sound gets from the ear into the brain.

Normally, I’m not a fan of disability simulations. These auto-generated captions, however, failed in a way so completely similar to what I hear, and for so many of the same reasons, that they offer the closest thing I’ve yet found to actually having CAPD.

So the next time someone who seems to hear perfectly well asks “What?”, just assume they heard you say “you know you lose your hoop boys” – and that they respect you enough not to write you off as really spouting gibberish.


Was this helpful? Send a link to this post to someone who needs it – or leave me a tip.