Sitemap

Comparing LLM models and pricing

using bolt.diy

Jeff P
6 min readFeb 8, 2025

So I’m absolutely loving Bolt.new recently, but token pricing can get expensive pretty quickly as your project grows in complexity. After a couple of days worth of development on a new project, I’m already hitting over 200k of tokens PER MESSAGE.

A “token reload” in Bolt.new costs $20 (around £16) and gets you 10 million tokens. However, when you reach the point of burning 250k tokens per message, then you’ve burned through $20 of tokens in just 40 prompts! It’s not uncommon to get through 40 prompts in just 1–2 hours, so as you can imagine, it starts to become unsustainable at these rates. Bolt.new uses anthropic’s Claude 3.5 Sonnet LLM model, which is very good, but expensive!

This led me to taking a closer look at bolt.diy where you can setup the LLM of your own choice, potentially allowing you to use a cheaper way of running your project in bolt.

So this is what I decided to try…

I would setup different LLMS and then serve the exact same prompt to it, noting both the token cost and the outcome of the prompt.

Here’s the prompt

I want a modern, responsive website using react/VITE/typescript which compares pricing of LLM models. I want a modern header and footer with a logo, a hero image and some nice modern effects.

Let’s see what happened….

Claude Sonnet 3.5

Ok this is what Bolt.diy already uses, so I was expecting very good results, but here’s what happened.

here’s my API balance before the prompt…

Ok so let’s run it and see what it comes up with….

The Claude Sonnet 3.5 LLM gets to work straight away…no messing about…

and here’s the finished result….

As I expected, it did a great job on a first run. Claude Sonnet 3.5 is genuinely super impressive.

The cost….

So that just cost me $0.10 (around 8p) — Now don’t get me wrong, this is extremely reasonable considering what it just generated. I daresay that even some of the worlds best web developers would’ve struggled to make anything better in 15 minutes, let alone the 15 seconds it took Claude Sonnet 3.5 to generate, and if that was translated to a human “hourly rate”, then it’s the equivalent of paying someone $0.40 an hour, only the results are 60 times quicker!

Token usage

But here’s the thing…. This is a very basic project at this stage, so the token usage for each subsequent prompt is going to be minimal…. maybe 5000 tokens or so….. very manageable.

But as I say, on far more complex projects, I’ve seen instances of individual prompts gobbling up 250k tokens per prompt! You can hopefully see the sudden impact this makes on cost! We’ve gone from having the worlds cheapest super coder working for us, to now employing one of the worlds most expensive!

Anyway, we know the benchmark that Claude Sonnet 3.5 has set now….. so let’s look at some others.

OpenAI GPT-4o

A lot of people use this model daily in ChatGPT — let’s see how it fares against Claude Sonnet 3.5

here’s the result….

Ok so whilst it did it’s job and did what I asked it to, it’s pretty conclusive to say that it doesn't look as good as the one that Claude Sonnet 3.5 just generated. It will definitely need some additional prompts to start making it look as nice as the other one.

To be fair, the cost to generate this was significantly less. This cost $0.02 to complete. Then again, I would possibly need to use far more prompts to get it to the same level that Claude Sonnet 3.5 created…. could i achieve it in 4 additional prompts or less….possibly….. but we’ll leave it that for now for a genuine comparison.

let’s move on…

Gemini 2.0 Flash

So interestingly, Flash 2.0 actually failed on the first prompt…the code it wrote was incomplete, leading to a typescript error….

I had no choice to but to try and fix this as I wanted to see what the website would look like, but this of course means a second prompt, resulting in the use of more tokens and ultimately more cost…

To fix the issue, I simply copied the error and fed it back into the prompt….

somewhat frustratingly, this didn’t fix the issue either. I decided I’d give it one more chance to redeem itself. I closed everything down and ran the same prompt again….. only this time it failed with a completely different typescript error!

so based on this, I’d say Google Gemini 2.0 flash is currently not fit for purpose….. let’s move on

Perplexity — Sonar Huge Online

this model also failed…

Seeing as I’m in a bit of an unforgiving mood now, I’ll move on!

OpenAI o1-mini (via OpenRouter)

Claude 3.5 Haiku

Ok so the results of Claude 3.5 Haiku were very good…in fact arguably better than Sonnet which surprised me!

the cost….

before…

after…

$0.03 ….that’s less than a third of the cost of using Sonnet

Deepseek R1 (Free version)

unfortunately this couldn’t complete, and even when it was trying to write the code it was painfully slow….. well, slow by AI standards anyway…… it was typing onto the screen at about 80 words per minute so still quicker than me! :-)

Deepseek R1 (full version)

unfortunately the full version couldn’t complete without issues either….

Conclusion

Obviously this isn’t an exhaustive test, but from the hour or so I spent testing these LLM’s, if you want to build a website and you like the Bolt framework, but want to save some money, your best bet is to go with Claude 3.5 Haiku — It’s just under a third of the price of Sonnet, and seems to perform just as well for basic tasks. Not sure how it would fare with more complex projects, but for spinning up a website quickly and cheaply, it certainly seems like the way to go.

As for now, I don’t feel like I have a truly viable alternative for saving money on complex projects in Bolt.new, so will keep researching different ways to tackle this.

--

--

Jeff P
Jeff P

Written by Jeff P

I tend to write about anything I find interesting. There’s not much more to it than that really :-)

Responses (1)