Using the Web Speech API for Multilingual Translations

Using the Web Speech API for Multilingual Translations

Since the early days of science fiction, we have fantasized about machines that talk to us. Today it is commonplace. Even so, the technology for making websites talk is still pretty new.

We can make our pages on the web talk using the SpeechSynthesis part of the Web Speech API. This is still considered an experimental technology but it has great support in the latest versions of Chrome, Safari, and Firefox.

The fun part for me is using this technology with foreign languages. For that, Mac OSX has great support for this on all browsers. On Windows, you have to use Chrome. We’re going to walk through a three-step process to create a page that speaks the same text in multiple languages. Some of the basic code is derived from documentation found here but the final product adds some fun features and can be viewed at my Polyglot CodePen here.

Screen shot of the completed Polyglot app with a menu of languages.

Step 1: Start Simple

Let’s create a basic page with a


Note: For best results on a Mac, use the latest version of Chrome, Safari, or FireFox. On Windows, use Chrome.

The paragraph with ID warning will be shown only if the JavaScript detects no support for the Web Speech API. Also, note the ID values for the textarea and the button as we will use those in our JavaScript.

Feel free to style the HTML any way you’d like. You’re also free to work off the demo I created:

See the Pen
Text-To-Speech Part 1
by Steven Estrella (@sgestrella)
on CodePen.

Adding a style rule for the disabled state of the button is a good idea to avoid confusion for the few people who still use incompatible browsers, like the now-quaint Internet Explorer. Also, let’s use a style rule to hide the warning by default so we can control when it’s actually needed.

button:disabled {
  cursor: not-allowed;
  opacity: 0.3;
}

#warning {
  color: red;
  display: none;
  font-size: 1.4rem;
}

Now on to the JavaScript! First, we add two variables to serve as references to the “Speak” button that triggers the speech and to the