For some reasons you may need to convert a text into an audio file. The so called text-to-speech technology allows you to do so. Developing your own text-to-speech technology takes a long time and it's not an easy thing. Therefore, the easiest solution is using a service, with the drawback of having to pay.
One of the Text-to-Speech service is provied by Google. It's known to have a pretty good results. They also provide the API, makes it easy to integrate with your application. In this tutorial, I'm going to show you the basic example usages of Google Text-to-Speech API in Node.js, from the preparation until the code.
Preparation
1. Create or select a Google Cloud project
A Google Cloud project is required to use this service. Open Google Cloud console, then create a new project or select existing project
2. Enable billing for the project
Like other cloud platforms, Google requires you to enable billing for your project. If you haven't set up billing, open billing page.
3. Enable Google Text-to-Speech API
To use an API, you must enable it first. Open this page to enable Text-to-Speech API.
4. Set up service account for authentication
As for authentication, you need to create a new service account. Create a new one on the service account management page and download the credentials, or you can use your already created service account.
In your .env
file, you have to add a new variable
GOOGLE_APPLICATION_CREDENTIALS=/path/to/the/credentials
The .env
file should be loaded of course, so you need to use a module for reading .env
such as dotenv
.
Dependencies
This tutorial uses @google-cloud/text-to-speech
. Add the following dependency to your package.json
and run npm install
"@google-cloud/text-to-speech": "~0.3.0"
"dotenv": "~4.0.0"
"lodash": "~4.17.10"
1. Synthesize Speech
The example below is a basic example of how to use speech synthesization. You need to provide the text to synthesize, audio encoding, and voice output configuration (optional). If successful, it will return audioContent
on the response body. Then you can write it to a file.
require('dotenv').config();
const _ = require('lodash');
const fs = require('fs');
const textToSpeech = require('@google-cloud/text-to-speech');
const client = new textToSpeech.TextToSpeechClient();
const request = {
// The text to synthesize
input: { text: 'This is an example' },
// The language code and SSML Voice Gender
voice: { languageCode: 'en-US', ssmlGender: 'NEUTRAL' },
// The audio encoding type
audioConfig: { audioEncoding: 'MP3' },
};
const outputFileName = 'output.mp3';
client.synthesizeSpeech(request)
.then(async (response) => {
console.log(response);
const audioContent = _.get(response[0], 'audioContent');
if (audioContent) {
fs.writeFileSync(outputFileName, audioContent, 'binary');
console.log(`Audio content successfully written to file: ${outputFileName}`);
} else {
console.log('Failed to get audio content');
}
})
.catch((err) => {
console.error('ERROR:', err);
});
2. List Voices
The example below is for getting the list of voices supported by Google Text-to-Speech service. You may need to run it to get the latest list.
require('dotenv').config();
const textToSpeech = require('@google-cloud/text-to-speech');
const client = new textToSpeech.TextToSpeechClient();
client.listVoices({})
.then(async (response) => {
console.log(JSON.stringify(response[0]));
})
.catch((err) => {
console.error('ERROR:', err);
});
Below is the list of supported voices at the time this post was written.
Language Code | Name | SSML Gender | Natural Sample Rate (Hz) |
es-ES | es-ES-Standard-A | FEMALE | 24000 |
it-IT | it-IT-Standard-A | FEMALE | 24000 |
ja-JP | ja-JP-Standard-A | FEMALE | 22050 |
ko-KR | ko-KR-Standard-A | FEMALE | 22050 |
pt-BR | pt-BR-Standard-A | FEMALE | 24000 |
tr-TR | tr-TR-Standard-A | FEMALE | 22050 |
sv-SE | sv-SE-Standard-A | FEMALE | 22050 |
nl-NL | nl-NL-Standard-A | FEMALE | 24000 |
en-US | en-US-Wavenet-D | MALE | 24000 |
de-DE | de-DE-Wavenet-A | FEMALE | 24000 |
de-DE | de-DE-Wavenet-B | MALE | 24000 |
de-DE | de-DE-Wavenet-C | FEMALE | 24000 |
de-DE | de-DE-Wavenet-D | MALE | 24000 |
en-AU | en-AU-Wavenet-A | FEMALE | 24000 |
en-AU | en-AU-Wavenet-B | MALE | 24000 |
en-AU | en-AU-Wavenet-C | FEMALE | 24000 |
en-AU | en-AU-Wavenet-D | MALE | 24000 |
en-GB | en-GB-Wavenet-A | FEMALE | 24000 |
en-GB | en-GB-Wavenet-B | MALE | 24000 |
en-GB | en-GB-Wavenet-C | FEMALE | 24000 |
en-GB | en-GB-Wavenet-D | MALE | 24000 |
en-US | en-US-Wavenet-A | MALE | 24000 |
en-US | en-US-Wavenet-B | MALE | 24000 |
en-US | en-US-Wavenet-C | FEMALE | 24000 |
en-US | en-US-Wavenet-E | FEMALE | 24000 |
en-US | en-US-Wavenet-F | FEMALE | 24000 |
fr-FR | fr-FR-Wavenet-A | FEMALE | 24000 |
fr-FR | fr-FR-Wavenet-B | MALE | 24000 |
fr-FR | fr-FR-Wavenet-C | FEMALE | 24000 |
fr-FR | fr-FR-Wavenet-D | MALE | 24000 |
it-IT | it-IT-Wavenet-A | FEMALE | 24000 |
ja-JP | ja-JP-Wavenet-A | FEMALE | 24000 |
nl-NL | nl-NL-Wavenet-A | FEMALE | 24000 |
en-GB | en-GB-Standard-A | FEMALE | 24000 |
en-GB | en-GB-Standard-B | MALE | 24000 |
en-GB | en-GB-Standard-C | FEMALE | 24000 |
en-GB | en-GB-Standard-D | MALE | 24000 |
en-US | en-US-Standard-B | MALE | 24000 |
en-US | en-US-Standard-C | FEMALE | 24000 |
en-US | en-US-Standard-D | MALE | 24000 |
en-US | en-US-Standard-E | FEMALE | 24000 |
de-DE | de-DE-Standard-A | FEMALE | 24000 |
de-DE | de-DE-Standard-B | MALE | 24000 |
en-AU | en-AU-Standard-A | FEMALE | 24000 |
en-AU | en-AU-Standard-B | MALE | 24000 |
en-AU | en-AU-Standard-C | FEMALE | 24000 |
en-AU | en-AU-Standard-D | MALE | 24000 |
fr-CA | fr-CA-Standard-A | FEMALE | 24000 |
fr-CA | fr-CA-Standard-B | MALE | 24000 |
fr-CA | fr-CA-Standard-C | FEMALE | 24000 |
fr-CA | fr-CA-Standard-D | MALE | 24000 |
fr-FR | fr-FR-Standard-A | FEMALE | 24000 |
fr-FR | fr-FR-Standard-B | MALE | 24000 |
fr-FR | fr-FR-Standard-C | FEMALE | 24000 |
fr-FR | fr-FR-Standard-D | MALE | 24000 |
That's all about how to use Google Text-to-Speech API in Node.js. Thank you for reading this post.