· By Bijan Bowen
Bringing Someone Back with AI and Robotics
Last May, I had the idea to digitally bring back the memory of a lost loved one, in this case, my late grandfather. I went through with it using a prototype of the Ominous Industries R1 robot, which I sell here for $199. Truth be told, I was a bit creeped out by the experience and decided to shelve the project. After sitting on the experience for about a year, I decided that I wanted to once again try this with the updated R1 and through some of the new AI tools available to help replicate the human voice.
My first course of action was to find the only video I had of my grandfather speaking, which had been filmed well over a decade ago and was of rather poor audio quality. I trimmed an audio sample from the video and headed on over to Elevenlabs, where I knew I would have the best chance of turning this sub-par audio clip into something at least semi-usable. I got the clip uploaded, and after setting a couple of tags and a short description for the voice, I had a clone that was as good as I was going to be able to get with the current level of technology accessible to me.
My next hurdle was that I had no good pictures of my grandfather that I could easily repurpose to be the face of the R1. To combat this, I once again turned to AI. I took a screenshot from the same video I got the audio from and browsed around for some AI photo upscaling tools. I was able to find a locally run solution called Upscayle, but the results were unfortunately not able to overcome the low-resolution nature of my source image. I went to the net and found an online solution that was very helpful, though it was an extremely expensive subscription, so I will decline to name it here as I do not care to promote expensive subscription services.
Now that I had my upscaled face image and a cloned voice, I was able to move on to integrating these into my R1's Unity project, so that I could once again "speak" to my grandfather, or a digital version of him, if you will. Since the R1 is already set up to be a conversational robot, I could just import the face image as a 2D sprite and swap it with the demo face image that the robot comes with. The next step was to swap the Microsoft Azure TTS that the robot comes with by default with the Elevenlabs voice clone.
To do this, I found an unofficial Elevenlabs Unity implementation on GitHub, which made it a breeze to swap my TTS for Elevenlabs. Once I completed this, I was ready to launch the application onto the robot and experience it. The first thing to note is that the cloned voice was extremely quiet and a bit difficult to hear on camera. I have tested other Elevenlabs voices, and this is not the case, so I attribute this quietness to a less-than-optimal source clip that I had cloned from. Other than that, it really did sound like him, and my generally light and humorous approach to making videos took a bit of a back seat to the "odd" feeling I experienced while speaking to him once again.
I drifted off a bit and later revisited this to film an outro to the video, and I found that the initial shock had worn off, but it was truly a bit of an emotional experience. Though I am sure looking from the outside, it would likely be more difficult to understand the feelings at play during. When filming for the outro clip, he began speaking on the issues with online subscription services in response to me having said something about them on camera, which I found to be kind of funny and lighthearted. It was a good way to lighten things up, and I felt that it was a good place to end the experiment.
You can view the video for this article on my YouTube channel.
To get a robot like the one in the video, click here.
P.S. The robot in the video is a prototype of a new case design that will be available in late summer, though it will be more expensive than the current R1, which remains the top dog of affordable social robots.