Voice-recognition software from tech giants like Apple and Amazon is significantly less accurate for Black speakers, failing at roughly double the rate it does for White speakers. This technical gap means that the virtual assistants, dictation tools, and educational apps many families rely on are fundamentally less reliable for Black users.
Major speech-to-text systems fail to understand Black speakers far more often than White speakers, creating a digital divide where certain families must work twice as hard to use everyday technology.
This isn't just a minor glitch; it is a "technology tax" on Black households. When a smart speaker fails to understand a command or a dictation tool mangles a text message, it requires extra time, effort, and frustration to correct. For parents, this is especially critical in educational contexts. If a child uses a voice-controlled reading assistant or a language-learning app that isn't tuned to their voice, they may receive incorrect feedback that stalls their progress or harms their confidence.
Beyond the home, these systems are increasingly used to transcribe medical appointments, insurance calls, and even police body-cam footage. If the software is prone to a 35% error rate for Black speakers, the risk of life-altering misunderstandings—from medical misdiagnoses to legal errors—becomes a systemic danger rather than a mere inconvenience.
AI models are only as good as the data they are "fed" during training. Most commercial speech-recognition systems were developed using massive datasets that heavily favor White, middle-class accents and speech patterns. When these systems encounter different pronunciations, pitches, or rhythms—specifically those found in African American Vernacular English (AAVE)—the software's "acoustic models" are unprepared to process them.
The researchers were concerned that while AI is marketed as a universal tool, the underlying math might be inherently biased. They wanted to test if the "progress" in voice tech was actually being shared equally across different racial groups. By comparing the error rates of five major tech companies, they found that the gap wasn't a result of a lack of vocabulary, but a failure to recognize the actual sounds of Black speech.
The study revealed a massive "error gap" across systems built by Amazon, Apple, Google, IBM, and Microsoft. Researchers found that the software was consistently worse at transcribing Black voices regardless of the company providing the service.
- Average error rates were nearly double: Black speakers faced a word error rate of 35%, compared to just 19% for White speakers.
- The "unusable" threshold: More than 20% of the audio from Black speakers resulted in transcripts that were more than half wrong, rendering them essentially useless. For White speakers, this happened only 2% of the time.
- Black men faced the steepest hurdle: The accuracy gap was widest for Black men, who saw an average error rate of 41%.
- Linguistic features matter: The more a speaker used specific patterns of AAVE, the more likely the software was to fail. This confirms the software is specifically struggling with dialect and cultural speech patterns.
The findings imply that tech companies have treated a specific version of American English—White and suburban—as the "default" for humanity. When companies don't intentionally include diverse voices in their training sets, they aren't just making a technical error; they are deciding who the technology is for.
This forces Black children and parents into a constant state of "code-switching" just to be understood by their own devices. It subtly reinforces a message that their natural way of speaking is "incorrect" or "broken," when in reality, it is the engineering that is deficient.
The study used a relatively small sample of 115 speakers, which cannot represent the massive diversity of accents and dialects within the Black community across the entire U.S. Furthermore, the Black speakers in the study were primarily from the East Coast and the South, while the White speakers were from California. This means some of the errors could be attributed to regional regionalisms rather than race alone.
It is also worth noting that the audio for some Black speakers was recorded in 2004, whereas the White speakers’ audio was more modern. While the researchers attempted to match the audio quality, the difference in recording technology could have slightly inflated the error rates. Finally, while these companies have likely updated their models since 2020, the foundational bias in how AI is trained remains an industry-wide problem.
- If your child uses voice-activated educational tools, sit with them and observe if the device "ignores" or misinterprets them more often than it does for others. If the error rate is high, consider switching to text-based tools to avoid unnecessary frustration.
- If you are using speech-to-text for important documents, such as an email to a teacher or a medical form, never send it without a manual line-by-line review. Assume the AI has missed or changed words that alter your meaning.
- If a device consistently fails to understand your family, avoid telling your child to "speak more clearly." Instead, explain that the machine was poorly designed and doesn't know how to listen to all kinds of voices yet.
- If you are choosing new home tech, look for products that allow for "voice training" or settings that claim to support diverse dialects, though be skeptical of these marketing claims until you test them yourself.
Voice technology is currently optimized for a narrow slice of the population, leaving Black families to deal with a "broken" experience. If the smart speaker in your kitchen or the dictation app on your phone isn't working for you, it is a failure of the data and the developers, not your voice. Until these companies prioritize diverse data, treat automated transcripts as a rough draft rather than a final word.
Koenecke, Allison, Nam, Andrew, Lake, Emily et al. (2020). Racial disparities in automated speech recognition. Proceedings of the National Academy of Sciences. doi:10.1073/pnas.1915768117 — pnas.org


