To see our 3D Printed DIY product offerings, head to our Etsy shop by clicking here!

By Bijan Bowen

Testing locally run AI portrait animation

The thing I most look forward to in the AI space is being able to generate videos locally, uncensored, and free of subscriptions or content filters. Things like SORA and the up-and-coming Luma Dream Machine are a cool display of what I hope to eventually be able to accomplish on my own hardware, free of all restrictions. Since there are no local variants (that I am aware of) comparable to something like Luma, I have turned my attention to the next best thing: the ability to generate speaking avatars from static portrait images.

I recently stumbled upon a Reddit post mentioning a new release of such a thing, the Hallo Portrait Image Animation repository. Released just a few days ago, this repo had some exciting sample videos, and I was eager to try it out on my own machine to see if it matched what was demonstrated on GitHub. I have seen some other similar projects, but up until now, I have not experimented with running any locally. Since this model just came out, I figured it might be a good one to test in terms of trying something that likely utilizes more recent advancements.

The hallo github repo

The setup was relatively similar to your standard GitHub repo clone and requirements.txt install, but I also needed to download some files from the group's Hugging Face repository, which ended with a lesson on why some people recommend using huggingface-cli to download files instead of git lfs. After I remedied the challenges I had faced with git lfs (by opting to use the HF-cli instead), I was ready to start experimenting. In order to ensure everything had installed properly and was working, I ran the example script referenced in the Hallo GitHub, which generated an animated portrait with some bilingual English and Chinese speech. I was impressed with the mouth movements of the animation, but what really caught my attention was the eye movement. The blinking seemed to be pretty natural, especially for something that was running locally on my own machine.

the hallo example provided to test

Next up, in keeping with the spirit of my questionable sense of humor, I decided to have a little fun. I mean, what else would I want to do with something like this? I fired up the OpenDalle 1.1 image model that I have used and hold very dear and prompted it to generate a portrait of a beautiful obese human. Once I had my beautiful obese human, I brought it into Photoshop along with a different OpenDalle image I had generated a while back and cropped them to fit the suggested parameters for Hallo. For those interested, the suggested items were a square portrait, the face occupying 50-70% of the image, and not turned more than 30 degrees.

The beautiful obese human from OpenDalle

Now that I had my portraits, I wanted to find a WAV file to sync them to. I headed to ElevenLabs and searched through my history for something suitable that would make me laugh. It was also easiest to use them as they offer a WAV download so I did not need to deal with any audio file conversions. Once that was complete, I transferred the files to the machine running Hallo and ran the portrait of the obese man first. The generations took a while, a few minutes each at minimum, and I noticed that my 3090ti was beginning to tickle the mid-80s (Celsius). Those of you who have seen the build video on my localllm machine will not be surprised, as I have not yet swapped the components into my larger Thermaltake case.

The nvidia-smi showing a hot 3090ti

The Hallo GitHub states that the model was tested on an Nvidia H100. I do have two 3090ti's in my machine, but the model was only utilizing one of the cards, and the nvidia-smi was showing the usage at about 14GB of VRAM, but with about 420 watts out of 450 being pulled, so the card was working hot and hard. Once the generation was done, the outputted file was saved to the Hallo directory as an MP4, and I was able to play it and laugh hard for about 15 seconds before moving on to the generation of the second image.

Next up, the portrait was of much higher quality in terms of how the face was framed. I noticed that the first generation with the beautiful obese human did not seem to retain any of the impressive eye movement that the original demonstration had outputted. I went ahead and ran it again with the new portrait, and the results were EXTREMELY impressive. The eye movement was extremely well done, and the overall quality was very good. Once that was done, I really had nothing left to do that I would be able to post about on YouTube, so I decided that for today, this would conclude my testing of Hallo.

A well done generation from hallo

You can view the video for this article on my YouTube channel.

0 comments

Leave a comment

Please note, comments must be approved before they are published