By Bijan Bowen

Testing MetaVoice 1B - Local TTS & Voice Cloning

I was recently reading the very interesting Localllama subreddit and came across a thread asking about offline TTS solutions. I have had some experience with a few different offline TTS solutions, such as Piper, but one of the responses to this thread caught my eye. It mentioned MetaVoice 1B, an open-source and locally run AI TTS that can clone voices based on a 30-second voice clip. I was very interested in trying this out for myself, so I popped over to the GitHub page for the project and got to work.

The MetaVoice-1B Github Repo

Once I got the repo cloned and set up, I began to experiment with the WebUI option, which allows you to input text, choose one of the three preset voices (or clone your own from an audio clip), and generate the speech. The speech can be played through the WebUI and is also saved locally as a WAV file. I experimented with some of the preset voices and had them say some funny things, and I was enthused by the output.

The MetaVoice-1B WebUI

Following this, I took a look at the serving.py file, which allowed MetaVoice to run as a server instance, something that I love to experiment with, as it allows me to utilize my LocalLLM machine in tandem with my R1 Robot, which I sell for $199. To get the TTS implemented in place of the Microsoft Azure TTS that my robot ships with, I had to make some changes in the Unity scripts that run my robot. After a bit of tinkering, I was able to get the robot to send the TTS requests to my local machine and return them as spoken speech. It was pretty cool!

The Unity Script to communicate with MetaVoice

Next up, after a hilarious settings mismatch which saw the audio being played in extreme slow motion, I decided to test the voice cloning capabilities of the TTS. I decided to keep things simple and search for voice clips of both of our polarizing political candidates. To clone the voice, all I needed to do was include the URL to the sample speech to be cloned inside the request to the server. I was very impressed with how simple it was to clone a voice and was excited to try it.

Our first attempt produced relatively favorable results. While this was purely an experimental test, I was satisfied with the speech output and highly entertained by the fact that it was being played alongside one of the default female faces that ships with my R1 Robot. Next, I cloned the other voice sample and found the results to be subpar compared to the first cloning test. I should note that the "case" pictured on my Robot is a 3D-printed prototype first design of a new case. I am going to make some big changes to it after seeing it live on the robot.

The Ominous Industries R1 Intelligent Companion Robot with a prototype case

In conclusion, the MetaVoice 1B TTS and speech cloning system was pretty impressive for something that is being run on my local machine. I am a big proponent of all things locally run, and this was a definite improvement over any previous locally run/free-to-use TTS/Voice Cloning software I have come across to date. While many argue that ElevenLabs is still on top, MetaVoice gives hope that in a few years it may not be far-fetched at all to be locally generating speech that is at the level of the current paid offerings available to us.

You can view the video for this article on my YouTube channel.

0 comments

Leave a comment