What is Text to Speech? Explain It Like I’m Five

by Chris Von Wilpert, BBusMan • Last updated January 19, 2024

Expert Verified by Leandro Langeani, BBA

First-Person Perspective: We buy, test and review software products based on a 3-step rating methodology and first-hand experienceIf you buy through our links, we may get a commission. Read our rating methodology and how we make money.

What is text to speech?

Text to speech (TTS) is a technology that turns written texts into audio. AI is being used for TTS and the applications for both individuals and businesses has boosted enormously. Experts predict that TTS will completely change how we live and work.

Text to speech fast facts

  • TTS technology offers benefits for individuals and businesses.

  • AI lets anyone easily convert texts to spoken audio.

  • Although text to speech technology is highly advanced, it can still struggle with expressing the right emotions.

  • Choosing TTS software depends on purpose, budget, and technical skills. 

  • Not all text to speech apps produce high quality output.

Great TTS apps allow users to choose voices from a library and some tools use voices from famous people. Photograph: Speechify

How can I turn text to speech?

The first way to turn text to speech is by using an online tool. These allow you to upload or paste text that is then converted into audio. Speechify and Natural Reader are amongst the most popular TTS tools 

You can also install software on your computer and run the application whenever you need it. Murf, Descript, and Balabolka are three programs that deliver good TTS quality.

Finally, you can also use built-in features of software and the operating system of your devices. In Microsoft Word, for instance, you can use Read Aloud. Android and iOS devices also offer built-in text-to-speech technology that can be used with various apps.

What are some applications of text to speech technology?

Text-to-speech (TTS) technology is used for a wide range of applications, like:

  • TTS can transform written texts, books, blogs etc. into audiobooks, podcasts, or videos, etcetera.
  • Video producers can use it to create voice overs to make their content more interesting for the audience.
  • TTS allows people to speed up their productivity and listen to written material while commuting, for example. 
  • Voice cloning is also a popular application of text-to-speech and it can be used in games, animations, videos, and so forth.
  • People with visual impairments use it to access information that is originally only available as text.

Generating voice overs for videos is only one of the many applications of TTS technology. Photograph: Murf

What are the benefits of using text to speech software?

The main benefits of text-to-speech (TTS) software are: 

  • TTS makes written information accessible to people with visual impairments.
  • Converted audio of written texts can be beneficial for people with dyslexia, ADHD, and other psychological conditions..
  • For some people, listening to audio helps them better remember study material. 
  • People who study other languages can improve their pronunciation by using text-to-speech technology.
  • Reading on screen can be more tiring than listening to the audio version, for some.
  • Listening to converted texts makes multitasking easier.
  • Text to speech technology is much cheaper and faster than hiring professional voice actors.

Are there any limitations to using text to speech?

Depending on the program and the source text, text to speech (TTS) technology can have limitations. Some text to speech tools have trouble generating audio from complex texts or informal language. 

AI voices can have difficulties with expressing emotions and using the right intonation. As a result, they can sound robotic. That can create aversion with listeners or distract them , making the message hard to understand.

The speed, tone, pitch, length of pauses, and other aspects of a human voice play a huge role in human communication, but not all TTS tools allow users to control these aspects to contribute to better communication.

What do I need to take into account when choosing a TTS program?

When you choose a text to speech program, the most important thing is to decide what you are going to use it for. Is it for reading documents out loud? Or do you, for example, want to create content with it and publish or sell the audio files?

The second key aspect is your budget. Some TTS tools are free, but they are only for personal use or don’t have features you actually need. That can, for example, include whether you can download the audio file and play it on your phone while performing other activities.

Finally, the technical skills you need can help you choose the best TTS software. Some tools are user-friendly and allow users to control the audio output. Other tools are very complicated and take more time to get the hang of.

Tweaking settings can improve the audio output quality. Photograph: Speechify

Is it legal to use text-to-speech?

The legality of text to speech (TTS) depends on two aspects: the source content and the usage determined by the software publisher. 

First, creating audio of text without permission can be a copyright infringement. This can be the case for any type of intellectual property assets, like books, poems, song lyrics, scripts, online articles…

TTS apps can also forbid you to generate material for commercial purposes and only allow generated audio for personal use.

Why is text to speech not working?

Here are some troubleshooting tips when a conversion from text to speech is not working.

  • Consult with the support team of the TTS or browse their FAQ troubleshooting section.
  • Make sure the audio of your device is not muted, or connected to a bluetooth audio device.
  • If you are using an online TTS app, make sure you are connected to the internet. 
  • Check if the audio is fully generated. The process may not have finished.
  • Ensure you are not violating any terms that can prevent the audio from being generated. On some sites, for example, you cannot create audio files based on hate speech texts.
  • If applicable, double-check the options you choose for the audio file. It is, for example, possible you picked an AI voice or a text in a language that is not supported by the TTS app.
  • Update the app, or reboot your device.

Can text to speech accurately pronounce different languages?

The ability of text-to-speech (TTS) to pronounce different languages depends on how the AI model was trained. For widely spoken languages, like English, Spanish, or French, there are a lot of good TTS voices.

For less widely used languages and dialects, there is often a limited set of audio training material. That makes it more difficult to train AI to sound human-like. 

Another aspect that plays a role is the AI model that is used. There are different ones and some can produce more realistic speech than others.

TTS technology needs to be trained to pronounce a language accurately. Photograph: Synthesys

Can text to speech accurately convey emotions in a written text?

Although TTS technology is highly advanced, it can still miss conveying emotions in written text. That can happen if the text has an ambiguous meaning, or when it lacks emotional words.

Another reason is that the TTS app hasn’t been trained to recognize and express emotions. In some apps, you can achieve more realistic speech by changing the settings. 

Finally, the AI model can also misinterpret the intended emotions and use an intonation that doesn’t correspond to the written text. It can, for example, sound neutral, whereas the intention of the text is excitement, sadness, confusion…

Does text-to-speech have any drawbacks?

Text to speech technology has two major drawbacks.

The main drawback of TTS is that it is hard to generate natural sounding human speech. This is often the case with free TTS apps, but even premium apps can sometimes not correctly express the intended emotions.

Another drawback is that people can abuse the technology. Like deepfake images, AI speech technology can be used to create deepfake audio. That could damage someone’s reputation or bring harm to others.

Make Your First $100K Per Month

Learn how to leverage a blog + smart AI to make $100k per month. Includes examples, illustrations, and step-by-step instructions.

>