Node.js - Google Cloud Text-to-Speech API Examples

Node.js - Google Cloud Text-to-Speech

For some reasons you may need to convert a text into an audio file. The so called text-to-speech technology allows you to do so. Developing your own text-to-speech technology takes a long time and it's not an easy thing. Therefore, the easiest solution is using a service, with the drawback of having to pay.

One of the Text-to-Speech service is provied by Google. It's known to have a pretty good results. They also provide the API, makes it easy to integrate with your application. In this tutorial, I'm going to show you the basic example usages of Google Text-to-Speech API in Node.js, from the preparation until the code.

Preparation

1. Create or select a Google Cloud project

A Google Cloud project is required to use this service. Open Google Cloud console, then create a new project or select existing project

2. Enable billing for the project

Like other cloud platforms, Google requires you to enable billing for your project. If you haven't set up billing, open billing page.

3. Enable Google Text-to-Speech API

To use an API, you must enable it first. Open this page to enable Text-to-Speech API.

4. Set up service account for authentication

As for authentication, you need to create a new service account. Create a new one on the service account management page and download the credentials, or you can use your already created service account.

In your .env file, you have to add a new variable

GOOGLE_APPLICATION_CREDENTIALS=/path/to/the/credentials

The .env file should be loaded of course, so you need to use a module for reading .env such as dotenv.

Dependencies

This tutorial uses @google-cloud/text-to-speech. Add the following dependency to your package.json and run npm install

  "@google-cloud/text-to-speech": "~0.3.0"
  "dotenv": "~4.0.0"
  "lodash": "~4.17.10"

1. Synthesize Speech

The example below is a basic example of how to use speech synthesization. You need to provide the text to synthesize, audio encoding, and voice output configuration (optional). If successful, it will return audioContent on the response body. Then you can write it to a file.

  require('dotenv').config();
  
  const _ = require('lodash');
  const fs = require('fs');
  
  const textToSpeech = require('@google-cloud/text-to-speech');
  
  const client = new textToSpeech.TextToSpeechClient();
  
  const request = {
    // The text to synthesize
    input: { text: 'This is an example' },
  
    // The language code and SSML Voice Gender
    voice: { languageCode: 'en-US', ssmlGender: 'NEUTRAL' },
  
    // The audio encoding type
    audioConfig: { audioEncoding: 'MP3' },
  };
  
  const outputFileName = 'output.mp3';
  
  client.synthesizeSpeech(request)
    .then(async (response) => {
      console.log(response);
      const audioContent = _.get(response[0], 'audioContent');
  
      if (audioContent) {
        fs.writeFileSync(outputFileName, audioContent, 'binary');
  
        console.log(`Audio content successfully written to file: ${outputFileName}`);
      } else {
        console.log('Failed to get audio content');
      }
    })
    .catch((err) => {
      console.error('ERROR:', err);
    });

2. List Voices

The example below is for getting the list of voices supported by Google Text-to-Speech service. You may need to run it to get the latest list.

  require('dotenv').config();
  
  const textToSpeech = require('@google-cloud/text-to-speech');
  
  const client = new textToSpeech.TextToSpeechClient();
  
  client.listVoices({})
    .then(async (response) => {
      console.log(JSON.stringify(response[0]));
    })
    .catch((err) => {
      console.error('ERROR:', err);
    });

Below is the list of supported voices at the time this post was written.

Language Code	Name	SSML Gender	Natural Sample Rate (Hz)
es-ES	es-ES-Standard-A	FEMALE	24000
it-IT	it-IT-Standard-A	FEMALE	24000
ja-JP	ja-JP-Standard-A	FEMALE	22050
ko-KR	ko-KR-Standard-A	FEMALE	22050
pt-BR	pt-BR-Standard-A	FEMALE	24000
tr-TR	tr-TR-Standard-A	FEMALE	22050
sv-SE	sv-SE-Standard-A	FEMALE	22050
nl-NL	nl-NL-Standard-A	FEMALE	24000
en-US	en-US-Wavenet-D	MALE	24000
de-DE	de-DE-Wavenet-A	FEMALE	24000
de-DE	de-DE-Wavenet-B	MALE	24000
de-DE	de-DE-Wavenet-C	FEMALE	24000
de-DE	de-DE-Wavenet-D	MALE	24000
en-AU	en-AU-Wavenet-A	FEMALE	24000
en-AU	en-AU-Wavenet-B	MALE	24000
en-AU	en-AU-Wavenet-C	FEMALE	24000
en-AU	en-AU-Wavenet-D	MALE	24000
en-GB	en-GB-Wavenet-A	FEMALE	24000
en-GB	en-GB-Wavenet-B	MALE	24000
en-GB	en-GB-Wavenet-C	FEMALE	24000
en-GB	en-GB-Wavenet-D	MALE	24000
en-US	en-US-Wavenet-A	MALE	24000
en-US	en-US-Wavenet-B	MALE	24000
en-US	en-US-Wavenet-C	FEMALE	24000
en-US	en-US-Wavenet-E	FEMALE	24000
en-US	en-US-Wavenet-F	FEMALE	24000
fr-FR	fr-FR-Wavenet-A	FEMALE	24000
fr-FR	fr-FR-Wavenet-B	MALE	24000
fr-FR	fr-FR-Wavenet-C	FEMALE	24000
fr-FR	fr-FR-Wavenet-D	MALE	24000
it-IT	it-IT-Wavenet-A	FEMALE	24000
ja-JP	ja-JP-Wavenet-A	FEMALE	24000
nl-NL	nl-NL-Wavenet-A	FEMALE	24000
en-GB	en-GB-Standard-A	FEMALE	24000
en-GB	en-GB-Standard-B	MALE	24000
en-GB	en-GB-Standard-C	FEMALE	24000
en-GB	en-GB-Standard-D	MALE	24000
en-US	en-US-Standard-B	MALE	24000
en-US	en-US-Standard-C	FEMALE	24000
en-US	en-US-Standard-D	MALE	24000
en-US	en-US-Standard-E	FEMALE	24000
de-DE	de-DE-Standard-A	FEMALE	24000
de-DE	de-DE-Standard-B	MALE	24000
en-AU	en-AU-Standard-A	FEMALE	24000
en-AU	en-AU-Standard-B	MALE	24000
en-AU	en-AU-Standard-C	FEMALE	24000
en-AU	en-AU-Standard-D	MALE	24000
fr-CA	fr-CA-Standard-A	FEMALE	24000
fr-CA	fr-CA-Standard-B	MALE	24000
fr-CA	fr-CA-Standard-C	FEMALE	24000
fr-CA	fr-CA-Standard-D	MALE	24000
fr-FR	fr-FR-Standard-A	FEMALE	24000
fr-FR	fr-FR-Standard-B	MALE	24000
fr-FR	fr-FR-Standard-C	FEMALE	24000
fr-FR	fr-FR-Standard-D	MALE	24000

That's all about how to use Google Text-to-Speech API in Node.js. Thank you for reading this post.

Node.js - Google Cloud Text-to-Speech API Examples

Preparation

Dependencies

1. Synthesize Speech

2. List Voices

Ivan Andrianto