Skip to main content

Command Palette

Search for a command to run...

Workhorse AI models you probably ignored

Updated
4 min read
Workhorse AI models you probably ignored

While the whole YouTube influencer space goes crazy about Gemini 3, I’m not so impressed. Don’t get me wrong, Gemini models are fantastic! And the V3 iteration is undoubtedly cool and capable. I’ve briefly run some tests (not benchmarks but actual real-world tests) on Gemini 3.

Meh 😐, it’s okay, but it's better than GPT-5 for sure, and it’s incremental just like most other models we’ve seen this year.

When the dust settles and we get past all the hype, what really matters is how reliable and cost-effective these models are for day-to-day use, not just coding but also business use cases.

In this article, I want to talk about some lesser-known models that all the YouTubers and the media in general miss.

Gemini flash

My favourite model is Gemini 2.0 Flash for general machine learning and not-so-complicated tasks like RAG chatbots.

Flash is super fast compared to Anthropic’s Haiku and OpenAI mini models, yet feels much smarter than nearly all other mini models I have worked with.

I’m using Flash models for:

  • Voice AI tasks.

  • Rag chatbots.

  • Categorization of products.

  • Rewriting search queries.

  • Various text generations.

Back in the day, GPT4o-mini was the goat of small models, but alas, in the last year or so, Google has just woken up and dominated the small model space.

PHI-3 / PHI-4 models

If you have a small 8GB - 16GB consumer GPU on hand and want to run a local model, then these PHI models are impressive. I use them for simple classification and text generation tasks. They generally perform well.

Sure, they are not as smart as the Gemini models, but if you use this model in combination with a semantic search or some other process, for example, categorization, you could do a semantic search to find the most relevant list of categories, then ask the model to pick the most appropriate.

This will save you some per-token costs and give you a decent level of accuracy.

Z.Ai GLM 4.6

It’s become a common practice for developers to use AI coding assistants like Claude Sonnet, Cursor, Windsurf, and so forth.

I myself love Sonnet and use it often, but Sonnet can be expensive and slow at times, which really impacts your flow state, not to mention all the daily and weekly limits if you’re using one of the subscription plans.

I still write and plan the “complex” stuff myself, but there’s just some boilerplate 🥱 brain-dead CRUD stuff that I just don’t need to write myself anymore; in the past, I would use Sonnet for these tasks, but of late have been really impressed by how much mileage I got out of GLM 4.6.

Z.ai subscriptions start from $6 a month, or you can use something like OpenRouter instead, and it’s fairly capable. I’ve generated some good Python, PHP, and even Golang code with the 4.6 model (using cline in VSCode).

You still need to review and clean up the code to an extent, like with Sonnet, but the 4.6 model seems fairly good with clearly defined tasks, tailwind stuff, and just CRUD-type low-thinking tasks.

Sure, Sonnet is still better at the more complicated stuff, but anyway, complex problems are fun for me, so I just use my brain for those 🙃.

Minimax models

Like GLM 4.6, the Minimax models are also very capable of agentic and coding tasks, and comparatively, the price is way cheaper than Sonnet. On OpenRouter, it’s like $2 per million output tokens.

Similar to GLM 4.6, it’s not as smart as Sonnet; however, remember that big ol brain you have! It’s usually good enough for most code generation tasks and can also be used as a smarter alternative to GPT-5 or any of the GPT models.

Let’s be honest, OpenAI is just not as good as it once was. Their API response times are not great, and most of these other alternative models have caught up.

OpenRouter

Much of what I am referencing here are models that I use through OpenRouter, which also happens to be one of my favourite AI platforms ever!

If you don’t know, OpenRouter is basically a “Router for AI”. Instead of you having to create accounts with OpenAI, Anthropic, Google, etc, you can simply register with them and buy credits. Those credits can then be used across hundreds (if not thousands) of models.

So, for example, in one project using just a single SDK implementation, you can swap between Gemini, Anthropic, Z.ai, OpenAI, Mistral, and many more providers by just changing the model name!

They also allow for multiple providers. Take, for example, GLM 4.6 has about 9-10 different companies that host the model, so essentially, if one provider is experiencing an outage, OpenRouter will automatically route your API call to one of the other providers. You can also optimize for costs to pick the cheapest provider!

Okay, this is sounding like an OpenRouter ad now, totally not (wouldn’t mind if they sponsored me though 😉), it’s really cool!

The only problem is that sometimes the API is slower compared to using the mainstream provider directly, and there is a small markup they add to your token usage to obviously make a profit, but generally, by using different providers and models, you end up saving money.