How to create an online Siri-like personal assistant

These days almost every phone has a personal assistant, and every major online company is looking into chatbots.

In this article I will explain how you can create an online Siri-like personal assistant using JavaScript.

The 3 main tasks this assistant should be able to do are:

  • Listen to user voice input
  • Understand and interpret spoken language
  • Give a spoken reply to the user

Let's start with the first point, give the assistant some spoken commands. With the SpeechRecognition api we can listen to the user, and translate spoken commands to text.

This api is currently only supported by Chrome, but is expected to land in Firefox soon.

Let's start by creating a new instance of SpeechRecognition. By setting the continuous and interimResults to true we get continuous feedback from the speech recognition instance while it tries to understand the spoken words. If you only care about the end result you can just set them to false.

const SpeechRecognition = SpeechRecognition || webkitSpeechRecognition;

const speechRecognition = new SpeechRecognition();  
speechRecognition.continuous = true;  
speechRecognition.interimResults = true;  

The SpeechRecognition api sends the user's spoken input to (google)servers which interpret the words and send a transcript back. To use this a working internet connection is needed.

Once the user stops talking the api will try to return a final result and the level of confidence it has of the accuracy of the transcription.

Now that the speech recognition is setup we can add some event handlers. The onStart handler will fire when the api starts listening to incoming audio, and the onSpeechStart handler will get fired when sound that is recognised by the speech recognition service as speech has been detected. The onResult handler get's fired when the api returns a result.

speechRecognition.onstart = event => {};  
speechRecognition.onspeechstart = event => {};  
speechRecognition.onresult = result => {};  

Now that the SpeechRecognition is setup, all we have to do is start listening.

speechRecognition.start();  

Now that we've completed the first step in creating an online assistant we can start with parsing the the input.

To understand natural language and get some useful structured data out of it, we can use api.ai. Api.ai can understand a lot of topics like weather, news, sport, points of interest and many more. Unfortunately these are only enabled for paid subscriptions. The free version allows you to test these trough their test console. For our demo we will use their api, so we'll only be able to use the small talk domain.

Now that step one and two are taken care of the only thing left to do is reply to the user. We can try to make the browser talk.

Using the speechSynthesis you can make the browser talk, choose a voice, language, set the pitch and a lot more. To start talking we create a new SpeechSynthesisUtterance instance with the string we want the browser to speak. This contains the content the speech service should read and information about how to read it.

Once we've created this, all we need to do is make the speech-synth speak it out loud.

const speechSynth = window.speechSynthesis;  
const synthesisUtterance = new SpeechSynthesisUtterance(text);  
speechSynth.speak(synthesisUtterance);  

By combining 2 lesser known but awesome browser api's, and an amazing online service, we can create a simple online personal assistant. This assistant is not only able to listen, but also to reply to us.

You can see it all in action with this codepen. Make sure you're using Chrome!

See the Pen Personal assistant by Sam (@Sambego) on CodePen.

As always, if you have any questions or remarks, feel free to post them in the comments.

Sam Bellen
Sam Bellen

I'm a software engineer at @madewithlove. I like playing around with the web-audio-api, and all things front-end related.