· By Bijan Bowen
A Quick Test Of Claude 3.5 Sonnet
With today's release of the Claude 3.5 Sonnet model from Anthropic, I wanted to find a way to test it that wasn't the generic "use in web interface" style of test. To do this, I decided to use the Ominous Industries R1 Companion Robot that I sell for $199, to give a face and voice to Claude 3.5, allowing a conversational and social way to interact with this new model. Until today, I had not used the Anthropic API in any of my projects, so I had to start from scratch. The experience of signing up for API access was a breeze, and I was able to redeem a $5 free API credit by verifying a phone number, which was a nice throwback to when OpenAI used to offer a $5 free credit for their API, but I digress.
The robot's software is pretty flexible in terms of switching between different LLM APIs, and the documentation from Anthropic easily allowed me to implement the new Claude 3.5 Sonnet into the robot. Once this was done, I fired up the robot and got to "meet" the new model. To begin, I found it rather fitting to ask Claude 3.5 Sonnet (through the website) for 10 questions to ask a new powerful AI. Once I had my list, I began to run through it, but as the questions were rather tame generic questions, I quickly got bored and decided to move on to something more entertaining.
Following this, I set the personality of the R1 to be an angry robot, which Claude seemed to completely ignore. I was a bit perplexed by this as right before I began to film the video for this article, I had successfully had it role-play as a girlfriend for about one question before it subsequently refused to engage in that sort of role-play anymore. I realized I may have needed to specify that it is to role-play the desired persona, so I changed "angry robot" to "role-play as a girlfriend bot," and it did understand the persona, as it began by refusing to engage in that sort of role-play. One interesting thing I noticed, was that the robot said it was NOT Claude 3.5 Sonnet, even though that was the model being used, weird.
Finally, I decided to see if it would do ANY role-play at all, and had it pretend to be a business partner at a startup who just got the news that we were to be acquired and make a lot of money. It complied with this request and responded accordingly. However, when I responded by telling it I was shocked that they went through with it due to its (the robot's) criminal record, it broke character and adamantly declared that it had no criminal record, thus ending the short role-play session.
I have to say that while it was intelligent and gave extremely thorough answers to some of the historical questions I asked it, it seemed extremely vanilla and guarded. I prompted it to role-play as a girlfriend robot, and it refused to do so, multiple times. I have to compare this to what I have experienced from GPT-4(o), and in doing so, GPT-4(o) seems like a totally different beast. While it also has safeguards, I find that it is a lot more willing to have some fun with you. To give an example, the GPT-4(o) API hooked up to the R1 is happy to role-play as a girlfriend or most other personas you give it. It even suggested I make an OnlyFans as a place to post funny NSFW robot conversations, which I went ahead and followed through with. I don't mention Gemini here as it is also very guarded.
Though not the focus of my testing here, I also tried Claude 3.5 Sonnet through the web interface, and I was extremely pleased with the results compared to what I have experienced with Claude 3 Opus. In terms of speed, it felt comparable to the speed differential between using GPT-4 and GPT-4(o). I am a big fan of Claude 3 Opus as I find its coding abilities to surpass what I experience with GPT-4(o) (especially recently, where I have seen a LOT of code hallucinations with GPT-4(o)). One thing I noticed was that 3.5 Sonnet seemed eager to give you a full script of code, while 3 Opus has been more reserved, giving only relevant snippets and leaving you to integrate them with the rest of your script. Overall, I am excited for Anthropic to have announced these current and future releases, and I am VERY eager to get my hands on 3.5 Opus, especially for code generation tasks. While there was a lot more to the announcement, I wanted to do a quick test of it as a conversational partner.
You can view the video for this article on my YouTube channel.