Why I Rarely Dictate to AI: The Hidden Risks of Voice-Based Hallucinations

Sometimes I like using voice-based AI to communicate with the various AI tools, but I never do so with anything that is high-risk. This means I rarely dictate to AI, as convenient as it might be. Why? Because voice-based AI hallucinates more often than text-based AI. The cause of the increase in hallucinations is due to the processing steps.

What Happens When You Talk to Voice AI

When you use voice-based AI instead of typing and allowing AI to respond via text, the process is different. Your speech goes through a transcription model first. That model converts audio into text. The text then goes to the language model, which generates a response. A third model converts that response back into spoken audio. Three discrete steps. Three places where something can go wrong. Compare that to text AI. You type. The model answers. This means only one step.

Text-based AI hallucinates too. It just does so differently. Further, with pure text, you are usually in a more deliberate, review-oriented mode. Voice interactions feel conversational, so people tend to listen to the fluent spoken response and move on, even when text is displayed on screen. The smooth, confident voice combined with natural conversation flow makes it easier to miss errors that a careful reader might catch.

The transcription step is where most of the trouble starts. OpenAI’s Whisper model has been documented inventing entire sentences, especially when audio contains pauses, fillers like “um,” accented speech, or background noise. This is also where AI’s bias can appear. If you speak in a way that differs from the most common training data, the system often struggles.

Researcher studying speakers with aphasia found that Whisper interprets long pauses as words and fabricates fictional sentences to fill the silence. The language model then answers the fabricated prompt as if it were real. The text-to-speech component reads the wrong answer back in a confident voice. Nothing sounds off. People have a tendency to believe confident speech, regardless of how incorrect it might be. With text AI, you can scan the response and catch errors. With voice AI, the answer arrives fluently and rolls past before you can flag it.

Keep in mind that different tools have different error rates. The clearer and more standard your speech, the lower the risk with higher-quality models. However, aphasia is not the only challenge. Speakers with a “deaf accent,” strong regional or foreign accents, non-native pronunciation, or other speech patterns often face higher error rates and hallucination risks. (“Deaf accent” is a term used by a colleague whose work I’ve covered previously.) If you plan to dictate to AI regularly, research the documented performance of specific tools with your type of speech and test them with sample recordings of your own voice.

The Legal Implications

The increase in hallucinations delivered in a confident voice is a real problem for lawyers. Consider client intake by voice AI. A client describes a slip and fall, pausing to gather her thoughts between sentences. Whisper interprets one of those pauses as continued speech and inserts a fabricated sentence. The client doesn’t check the summary before sending it to their lawyer. The intake summary now contains a statement the client never made. The associate reading the file takes it as fact and embeds the error into the case.

Similar risks appear in deposition prep tools that summarize recorded testimony or in voice-driven research assistants. If the transcription stage drops or invents words, everything built on top of it is compromised, and the lawyer may never see the original error.

Evaluating Voice AI Tools for Law Practices

If you are evaluating a voice AI tool for your practice, ask the vendor these questions:

  • How many processing steps does the system use?
  • Which transcription model handles the audio?
  • What error rates are documented, and on what kind of speech?
  • Does the tool flag low-confidence transcriptions, or does it pass them through silently?

Unfortunately, the answer to that last question is often “no.” You can instruct the AI to flag uncertainty, but that only works if the system actually recognizes the problem and obeys your instructions.

Safeguards for Using Voice AI

There are practical steps you can take to reduce these risks:

  • Record the original audio and have a human review the AI-generated transcript or summary against the raw recording before using it.
  • Use local or on-premises tools for sensitive or confidential matters. This avoids sending data to the cloud and can give you more control over the models.
  • Adopt hybrid human-AI workflows. Let the AI handle the first draft or research pass, but always keep a lawyer or trained staff member in the loop for verification and final judgment. Treat AI output as a starting point, never the final product.

These steps might slow your process down, but they are smart protections when accuracy and client confidentiality are at stake.

Final Thoughts

As a result of the increased hallucination risk with voice AI, it is important to warn your clients that if they use voice dictation, they should double-check the resulting transcript before sending the information to you. You should also remind them that anything they tell or write to AI may not be protected by attorney-client privilege. Finally, make certain to train all staff and lawyers in your firm about the heightened risks of verbal AI use.

Subscribe to My Blog

Get notified when I publish new posts.

Please wait...

Thank you for subscribing.

Categories