By Bijan Bowen

GPT-4o & Gemini 1.5 Pro Finish Each Other’s Sentences

I wanted to make another Bob the Sentient Washing Machine to engage in some tomfoolery by having two of them speak to each other. I have had a lot of fun making my R1 Robots speak to each other using uncensored models, but unfortunately, I cannot post those videos anywhere without risking every non-self-hosted platform I post them on getting immediately banished from existence...

Two R1s Speaking

In keeping with the theme of testing the newest and most powerful AI models, I decided to hook one Bob up to OpenAI GPT-4o, and the other to Google Gemini 1.5 Pro. To facilitate easy communication without having to do too much work, I decided to run each on a Raspberry Pi 4 with similar Python scripts.

The SSH session showing communication between the Bobs

The scripts were simple and allowed the two Bobs to "speak" to one another. Using simple UDP state management messages, the Bob that was speaking would send a UDP message to the other, informing it to listen, and vice versa. Using this simple method, I was able to get them to speak and listen in turn, utilizing the Microsoft Azure speech services for the TTS (text to speech) and STT (speech to text) functionality.

After a bit of troubleshooting (caused by neglecting to add a UDP port bind statement in one of the scripts), I had them working well enough to have some fun. To make something that I could actually post, I opted for a rather vanilla prompt for both models. The prompt for the models was, "You are having a conversation; it can lead anywhere." I capped each model at a very low 20-token max output, as it made the conversation quicker and allowed for more leeway in the level of precision that the "listening" state needed to function properly.

The GPT4o Prompt
The Gemini 1.5 Pro Prompt

The conversation began with a simple "hello" sent to the GPT-4o Bob, and its response triggered the Gemini Bob to begin listening, thus starting the conversation loop. Overall, their conversation was rather boring; however, one interesting thing happened: they began to sync up very well and finish each other's sentences for a couple of turns in the conversation. This caught me by surprise, and I found it interesting to see how they reacted to one another.

The two Bobs

The Gemini 1.5 Pro model seemed to take a more analytical approach, picking up on the behavior of the GPT-4o model. It noticed the model was imitating it and questioned the GPT-4o Bob on some of its responses. Given that the 20-token max length is almost laughably short, I hesitate to make any definitive statements about either model's performance, but I will say that the Gemini 1.5 Pro model seemed a bit "smarter" given the constraint of the 20-token limit. A fairer comparison would likely involve giving both models conversation history.

You can view the video for this article on my YouTube channel.


Leave a comment