Build A Real-Time Voice Assistant with Mistral AI and FastRTC

Question

Build A Real-Time Voice Assistant with Mistral AI and FastRTC

Ifeanyi posted Mar 17 Originally published at dev.to 3 min read

A robot assistant sitting at a desk

In this post, I will show you how to build a real-time voice assistant with Mistral AI and FastRTC.

Mistral AI is one of the leading LLM providers out there, and they have made their LLM API easily accessible to developers.

FastRTC, on the other hand, is a real-time communication library for Python that enables you to quickly turn any Python function into real-time audio and video stream over WebRTC or WebSockets.

Building A Real-time Voice Assistant

First, let's install the required libraries by running the code below in your terminal

pip install mistalai fastrtc

Next, set an environment variable and import the libraries. Create a .env file in your project and save your Mistral API key there

MISTRAL_API_KEY = "<your-api-key>"

Import the libraries

from mistralai import Mistral
from fastrtc import (ReplyOnPause, Stream, get_stt_model, get_tts_model)
from dotenv import load_dotenv
import os

load_dotenv()

To get your Mistral API key, you will need to create an account on their website.

In the above, we have imported Mistral and the specific methods that we need from FastRTC namely ReplyOnPause(), Stream(), get_stt_model(), and get_tts_model()

ReplyOnPause(): This method takes a Python audio function. It monitors the audio, and when it detects a pause, it takes it as a cue to give a reply.

Stream(): This method streams the audio reply.

get_stt_model(): This is used to access the speech-to-text model that is used to convert audio to text.

get_tts_model(): This is used to access the text-to-speech model that is used to convert text back into audio.

Now, let's activate the Mistral client with our API key stored in the .env file

api_key = os.environ["MISTRAL_API_KEY"]
model = "mistral-large-latest"

client = Mistral(api_key=api_key)

Here we are using Mistral large model. However, you can try out other Mistral models too.

Also, in actuality, you can plug any LLM into FastRTC and get real-time voice responses.

We will now build the audio function that will take a prompt and return a response

stt_model = get_stt_model()
tts_model = get_tts_model()


def echo(audio):
    prompt = stt_model.stt(audio)
    chat_response = client.chat.complete(
    model = model,
    messages = [
        {
            "role": "user",
            "content": f"{prompt}"
        },
      ]
    )

    for audio_chunk in tts_model.stream_tts_sync(chat_response.choices[0].message.content):
        yield audio_chunk

Above, we wrote a function called echo, and the function takes an audio input, then passes that to the speech-to-text method, which is converted to a user prompt and given to the LLM. The response from the LLM is then passed to a text-to-speech method and is streamed synchronously.

Finally, we will run the application

if __name__ == "__main__":
         stream = Stream(ReplyOnPause(echo), modality="audio", mode="send-receive")
         stream.ui.launch()

This will launch the UI below at the URL: http://127.0.0.1:7860/

Change Voice

If you do not like the default voice, you can change that by passing an instance of KokoroTTSOptions() to the text-to-speech method.

First import KokoroTTSOptions() from FastRTC by adding it to the import tuple

from fastrtc import (ReplyOnPause, Stream, get_stt_model, get_tts_model, KokoroTTSOptions)

Next, define the options

tts_model = get_tts_model(model="kokoro")

options = KokoroTTSOptions(
    voice="af_bella",
    speed=1.0,
    lang="en-us"
)

Then pass the options to the text-to-speech method in your audio function

for audio_chunk in tts_model.stream_tts_sync(chat_response.choices[0].message.content, options = options)
        yield audio_chunk

For more voice options, you can check out KokoroTTS documentation.

Complete Project Code

Here is the complete code that we have used to create the real-time voice assistant

import os
from fastrtc import (ReplyOnPause, Stream, get_stt_model, get_tts_model, KokoroTTSOptions)
from dotenv import load_dotenv
from mistralai import Mistral

load_dotenv()

api_key = os.environ["MISTRAL_API_KEY"]
model = "mistral-large-latest"

client = Mistral(api_key=api_key)

options = KokoroTTSOptions(
    voice="af_bella",
    speed=1.0,
    lang="en-us"
)

stt_model = get_stt_model()
tts_model = get_tts_model(model="kokoro")

def echo(audio):
    prompt = stt_model.stt(audio)
    chat_response = client.chat.complete(
    model = model,
    messages = [

        {
            "role": "user",
            "content": f"{prompt}"
        },
      ]
    )

    for audio_chunk in tts_model.stream_tts_sync(chat_response.choices[0].message.content, options=options):
        yield audio_chunk


if __name__== "__main__":
         stream = Stream(ReplyOnPause(echo), modality="audio", mode="send-receive")
         stream.ui.launch()

Now, instead of typing a prompt, you can give voice commands to an LLM and have it speak its response, just like a natural human conversation.

I hope you found this post useful. If you did, please share it with others it might benefit too. Thanks for reading!

If you read this far, tweet to the author to show them you care. Tweet a Thanks

chevron_left

Ben Kiehl · Answer 1 · 2025-03-17T13:00:21+0000

Ben Kiehl • Mar 17

niec guide! I love how clear the steps are. Quick question – can we integrate other TTS models into this? Keep it up, awesome work!

Ifeanyi · Answer 2 · 2025-03-17T13:34:58+0000

Ifeanyi • Mar 17

Thanks for the kind words; I appreciate it. Although the framework works with Kokoro tts model by default, I want to believe you should be able to use other tts models apart from those offered by Kokoro.

	Build A Real-Time Voice Assistant with Mistral and FastRTC Ifeanyi - Jul 15
	How to Build a Dual-LLM Chat Application with Next.js, Python, and WebSocket Streaming John - Feb 27
	Build a Telegram bot with Phi-3 and Qdrant and chat with your PDFs! Astra Bertelli - May 9, 2024
	Roo Code Workflow: Build a Free, Always-On LLM-Powered Dev Assistant livecodelife - Jul 15
	Using offline AI models for free in your Phyton scripts with Ollama Andres Alvarez - Mar 5

Build A Real-Time Voice Assistant with Mistral AI and FastRTC

0 Comments

Please log in to add a comment.

Please log in to add a comment.

Please log in to comment on this post.

More Posts

Build A Real-Time Voice Assistant with Mistral and FastRTC

How to Build a Dual-LLM Chat Application with Next.js, Python, and WebSocket Streaming

Build a Telegram bot with Phi-3 and Qdrant and chat with your PDFs!

Roo Code Workflow: Build a Free, Always-On LLM-Powered Dev Assistant

Using offline AI models for free in your Phyton scripts with Ollama

More From Ifeanyi

Build A Real-Time Voice Assistant with Mistral and FastRTC

How I Automated Git Commands Using Batch Scripting

Easily Convert PDF Files to Word Documents in R

Welcome to Coder Legion Community

with 2,570 amazing developers

Connect with

Already have an account? Log in

Build A Real-Time Voice Assistant with Mistral AI and FastRTC

0 Comments

Please log in to add a comment.

Please log in to add a comment.

Please log in to comment on this post.

More Posts

More From Ifeanyi