Why did AI prices drop 97% in two years?

The main driver is open source. Meta's Llama, DeepSeek and Google's Gemma 4 broke the oligopoly in which OpenAI, Google and Anthropic controlled access to the best models. On top of that, the Mixture of Experts (MoE) architecture and Google's TurboQuant method reach the same accuracy with 6× less memory and 8× more speed. DeepSeek V3.2 today costs $0.28 per 1M tokens instead of the $30 GPT-4 cost two years ago - and that level is reachable on far cheaper infrastructure.

Should my company deploy AI on its own servers instead of an API?

Yes, if at least one of three conditions applies: (1) you work with sensitive data (legal, medical, financial) that can't be sent to third-party servers; (2) you need millisecond response times for real-time applications; (3) you have specialized datasets you want to fine-tune a model on so it beats a generic commercial solution in your context. The mid-size Gemma 4 today runs on a standard business laptop - the infrastructure barrier has dropped.

Which AI model is best today?

There's no single "best" model - there's the right model for the right purpose. Claude Opus 4.6 leads in code (1549 Elo). GPT-5.4 and Gemini 3.1 Pro run neck and neck on general intelligence. Gemini 3.1 Pro has no real competitor in image and video analysis. For high-volume routine work, DeepSeek V3.2 or Gemini 2.5 Flash give an 8× better price-quality balance. A hybrid stack - cheap model for routine, premium for precision - typically cuts AI spend 60-80% with no loss of quality.

AI is getting cheaper. Faster than you think. And here's why it changes everything

★ Key takeaways

AI prices fell ~97% in two years - GPT-4-class capability used to cost $30 per 1M tokens, DeepSeek V3.2 costs $0.28 today.
Open source (Llama, DeepSeek, Gemma 4) plus new methods like MoE and Google's TurboQuant (memory -6×, speed +8×) is breaking the chip monopoly - Nvidia stock fell 17% in one day.
The real shift isn't a cheaper API - it's that models now run on a business laptop or a mid-tier server. That unlocks legal, medical and financial verticals that couldn't ship data to the cloud.
Geopolitically, DeepSeek is training V4 on Huawei chips, routing around US export controls. Competitive edge moves from hardware to smarter model design.
Practical move - a hybrid stack (cheap model for routine, premium for accuracy) cuts AI spend 60-80%; if you have sensitive data, a local model trained on your data becomes an investment, not an expense.

#AI pricing#Open-source AI#DeepSeek#Gemma 4#On-premise AI#AI strategy

Frequently asked questions

Why did AI prices drop 97% in two years?: The main driver is open source. Meta's Llama, DeepSeek and Google's Gemma 4 broke the oligopoly in which OpenAI, Google and Anthropic controlled access to the best models. On top of that, the Mixture of Experts (MoE) architecture and Google's TurboQuant method reach the same accuracy with 6× less memory and 8× more speed. DeepSeek V3.2 today costs $0.28 per 1M tokens instead of the $30 GPT-4 cost two years ago - and that level is reachable on far cheaper infrastructure.
Should my company deploy AI on its own servers instead of an API?: Yes, if at least one of three conditions applies: (1) you work with sensitive data (legal, medical, financial) that can't be sent to third-party servers; (2) you need millisecond response times for real-time applications; (3) you have specialized datasets you want to fine-tune a model on so it beats a generic commercial solution in your context. The mid-size Gemma 4 today runs on a standard business laptop - the infrastructure barrier has dropped.
Which AI model is best today?: There's no single "best" model - there's the right model for the right purpose. Claude Opus 4.6 leads in code (1549 Elo). GPT-5.4 and Gemini 3.1 Pro run neck and neck on general intelligence. Gemini 3.1 Pro has no real competitor in image and video analysis. For high-volume routine work, DeepSeek V3.2 or Gemini 2.5 Flash give an 8× better price-quality balance. A hybrid stack - cheap model for routine, premium for precision - typically cuts AI spend 60-80% with no loss of quality.

AI is getting cheaper. Faster than you think. And here's why it changes everything

Frequently asked questions

More from the blog

Inherited code and AI. A staged guide to cleaning up your codebase and surfacing the critical bugs

How We Control Computers Is About to Change. Here's My Bet, As an AI Practitioner.

I Designed an AI System That Could End Corruption and Nepotism in Lithuania