By Bijan Bowen

Apple Silicon Speed Test: LocalLLM on M1 vs. M2 vs. M2 Pro vs. M3

I wanted to test the speed of some different Apple Silicon-equipped machines. As many of my interests these days revolve around using Local LLM setups, I figured the best way to test these machines would be to run a local LLM on them and see how they perform. The machines I had for this test are as follows: an M3 iMac with 16GB of RAM, an M2 Pro (10-core) Mac Mini with 16GB of RAM, an M1 Mac Mini with 16GB of RAM, and an M2 MacBook Air with 8GB of RAM.

The macs lined up for testing

To be as fair as possible, I wanted to use the same software, the same model, and the same prompt. I decided to use, a simple, open-source user interface for interacting with local LLMs. For the model, I chose to use Llama 3 8B Q4, as it is a popular and powerful newer model. In terms of prompting, I kept all the settings default, except for the GPU layers option, which I pushed to maximum on the three desktop machines.

The M3 iMac generating a response

On the M2 MacBook Air, I received gibberish output with the GPU layers set to maximum, so after some trial and error, I allowed it to sit at 32, which was the default setting that loaded the model with. Doing this allowed the laptop to properly respond using the model, albeit extremely slowly.

The Macbook Air M2 generating jargon

The results were interesting, with the M2 Pro blowing the rest of the machines out of the water. The average generation speed, with the prompt: "tell me about 3D printing," for the machines is as follows:

  • M3 iMac 16GB: About 13 tokens per second.
  • M2 Pro (10-core) Mac Mini 16GB: About 27 tokens per second.
  • M1 Mac Mini 16GB: About 10 tokens per second.
  • M2 MacBook Air 8GB: An abysmal 1 token per second.

While the point of this test was purely curiosity-based, I found it to be a bit embarrassing how the M2 Air performed. I am going to attribute this to the lackluster 8GB of RAM that still comes standard on this machine (and even on the updated M3 MacBook Air as well). I would like to get my hands on a 16GB variant to see what difference in token speed is seen.

The computers with their generations open

The desktops performed well, and for someone who already owns an Apple Silicon desktop and wants to play around with Local LLMs without purchasing a dedicated setup, these machines will provide a pleasurable experience to get one's hands dirty in the world of Local LLMs.

You can view the video for this article on my YouTube channel.


Leave a comment