Stop Paying the AI Tax: How to Build Infrastructure You Own

Somewhere between the GPU orders and the board-level mandates, the math stopped working. Enterprises worldwide are spending more on AI than they are recovering in documented savings — and the gap is widening every quarter. This is not a fringe take from AI skeptics. It is the conclusion quietly emerging from Goldman Sachs Research, Sequoia Capital, and MIT economics. The question is no longer whether AI can create value. It is whether the infrastructure bill arrives long before the revenue does.

The $600 billion question Sequoia raised

In mid-2024, Sequoia Capital partner David Cahn published an analysis that reframed the entire AI spending debate. He took NVIDIA's data center revenue run-rate — roughly $150 billion annually — and applied standard cloud economics to estimate the end-user revenue required to justify that level of infrastructure investment. The answer: approximately $600 billion per year in AI-generated end-user revenue. The actual figure at the time was a fraction of that number.

Cahn was not predicting a collapse. He was identifying a structural gap. Someone in the chain — hyperscalers, enterprises, or VC-backed startups — is currently carrying costs that have not yet been matched by returns. As he put it: the only way to make the numbers work is to assume AI will be transformative at a scale never seen before. That may be true eventually. It is not the reality in most enterprise deployments today.

Goldman Sachs: too much spend, too little benefit

Goldman Sachs Research published a report with a title that did the work for them: "Gen AI: Too Much Spend, Too Little Benefit?" Jim Covello, Goldman's Head of Global Equity Research, made the central challenge explicit: AI is an extremely expensive technology being deployed to solve problems that did not require that level of cost to fix in the first place.

Covello's argument centres on task economics. The most valuable problems businesses face — strategic judgment, regulatory navigation, high-stakes relationship management — are the ones AI cannot yet reliably handle. The tasks AI does handle well — document summarisation, customer service deflection, code autocomplete — were already addressable with cheaper software. The cost-to-value ratio is inverted at both ends of the spectrum.

MIT economist Daron Acemoglu, interviewed in the same Goldman report, added structural context. Historical technology cycles — electrification, the internet — delivered broad productivity gains because they reduced the cost of doing things that were already valuable. Most current AI deployments do not meet that bar. They automate tasks that were already fast, or they attempt tasks they perform unreliably.

What the actual enterprise numbers show

The gap between AI spend and AI returns is not theoretical. It is showing up in operational data:

63% of enterprises now classify AI as an active FinOps concern, up from 31% in 2024, according to CloudZero. Spend is being tracked because it is painful.
AI and ML workloads now represent 22% of total cloud costs at SaaS and IT companies — a category that barely existed three years ago and has no proportional revenue line to show for it.
NVIDIA's data center revenue went from $26 billion annually in 2022 to over $100 billion by 2024. Nearly 50% of all cloud capital expenditure globally now flows into NVIDIA silicon. The investment is real. The measurable return at the enterprise layer is not yet at the same scale.
Cursor, the AI coding tool valued in the billions, reportedly forwards close to 100% of its revenue directly to Anthropic for API access — while Anthropic simultaneously built Claude Code, a direct competitor. This is not an edge case. It is the structural dynamic facing every AI-first company buying capacity from the same labs they compete with.

The arms race nobody can opt out of

Here is what makes this situation genuinely hard to navigate: the companies doing the spending know the ROI is uncertain, and they are spending anyway. This is rational, not irrational.

As Goldman Sachs Asset Management portfolio managers noted after meeting with 20 leading technology executives: the hyperscalers doing these calculations are not reckless. They see incremental returns. But they are also in an arms race where being the fourth-best frontier model — or the enterprise that skipped infrastructure investment for two years — may be competitively fatal. The spend is partly a bet on the future and partly a defensive posture.

Goldman's Brook Dane described it plainly: "You can't fall off the front end of the wave. There's a bit of an arms race here, and there's a little bit of a leap of faith embedded in that." That leap of faith is being taken with capital budgets that are, in many cases, 10x what they were in 2021.

Why own infrastructure becomes the only rational answer

The cost spiral has a predictable exit point: the companies that will survive it are the ones that stop renting compute and start owning it.

Every dollar spent on API tokens is a dollar that funds the infrastructure of your vendor — infrastructure that will be used to compete with you, or to raise prices once switching costs are high enough. The token price you pay today is subsidised. NVIDIA, Anthropic, and OpenAI are not running charitable operations; they are building the dependency first and monetising it second.

On-premise or private-cloud LLM infrastructure breaks this dynamic. The capital expenditure is front-loaded and visible. The marginal cost of each inference drops toward zero. There is no vendor repricing risk, no data leaving your perimeter, and no structural dependency on a supplier that is simultaneously your competitor.

This is not an option available to every company today — the upfront investment is real, and the operational expertise required is non-trivial. But the trajectory is clear. As open-weight models continue to close the gap with frontier closed models, and as inference hardware becomes more accessible, building your own stack will shift from an enterprise luxury to a competitive necessity.

The timeline that matters

Goldman Sachs Asset Management's Sung Cho framed the ROI debate correctly: over one to two years, the returns may not justify the investment. Over twenty years, they almost certainly will. The problem is that most enterprise budgets operate on a one-to-two-year horizon, and most of the current AI spending is being evaluated against that shorter window.

The companies that will look prescient in 2030 are not the ones that spent the most on API credits in 2025. They are the ones that used 2025 and 2026 to build infrastructure they will own — training pipelines, fine-tuned models, private inference clusters — while their competitors accumulated recurring vendor bills with no equity in the underlying technology.

AI is not getting cheaper for the enterprises consuming it as a service. It is getting more expensive relative to the value they can extract, because the value extraction requires the kind of deep customisation and data integration that API products are structurally incapable of delivering. The companies that understand this early will not just reduce costs. They will build a moat that API-dependent competitors cannot cross.

What this means practically

Audit your AI spend today. Categorise every token purchase, every SaaS AI add-on, every pilot that became a production line item. Map it against documented, measurable output — not projected savings.
Separate infrastructure from experimentation. API spend for prototyping is rational. API spend at production scale, indefinitely, is not.
Model the crossover point. For most mid-to-large enterprises running consistent inference workloads, the break-even between API cost and owned infrastructure is reached within 12 to 24 months. Run the numbers with real usage data, not vendor estimates.
Treat open-weight models seriously. The gap between Llama, Mistral, and the frontier closed models has narrowed substantially. For the majority of enterprise use cases — classification, extraction, summarisation, generation from structured data — open models running on owned hardware deliver competitive quality at a fraction of the recurring cost.

How to start building your own AI infrastructure

The shift from API consumer to infrastructure owner does not require a hyperscaler budget. The path is incremental, and most enterprises can begin within a single quarter. Here is a practical starting sequence:

Step 1 — Map your highest-volume use cases. Identify the two or three workflows generating the most token spend today. These are your best candidates for migration to owned infrastructure first, because the savings are immediate and measurable. Common examples: internal document Q&A, customer support triage, contract review, and report summarisation.
Step 2 — Select an open-weight model matched to the task. You do not need GPT-4 class capability for most enterprise tasks. Llama 3.1 (8B or 70B), Mistral, or Qwen models handle the majority of structured text tasks reliably. Match model size to your quality bar — smaller models are faster, cheaper to run, and easier to fine-tune.
Step 3 — Choose your deployment layer. For teams starting out, a single GPU server (NVIDIA A100 or H100) or a managed private-cloud instance (AWS Bedrock custom models, Azure AI Studio with private endpoints, or bare-metal providers like CoreWeave and Lambda Labs) is sufficient for moderate workloads. You do not need a data centre on day one.
Step 4 — Fine-tune on your own data. This is where owned infrastructure pays its largest dividend. A fine-tuned 7B model on your company's documents, tone, and domain vocabulary will consistently outperform a generic GPT-4 call for your specific task — at 1–2% of the per-token cost. Tools like Axolotl, Unsloth, and LlamaFactory make fine-tuning accessible without a dedicated ML team.
Step 5 — Set up a private inference endpoint. Deploy using vLLM, Ollama, or TGI (Text Generation Inference by Hugging Face). These serve your model via a standard OpenAI-compatible API — meaning your existing integrations require zero code changes. Your applications keep calling the same endpoint format; the traffic simply routes to your server instead of OpenAI's.
Step 6 — Instrument and iterate. Add logging, latency tracking, and output quality scoring from day one. The advantage of owned infrastructure is that you see everything — token counts, failure modes, latency percentiles. Use this data to continuously improve the model and right-size the hardware.

A realistic timeline: a mid-size enterprise with one internal technical hire (or an infrastructure partner) can have a private inference cluster running a fine-tuned model in production within six to eight weeks. The payback period on hardware versus equivalent API spend is typically under eighteen months for any team consuming more than 50 million tokens per month.

Where to start if you do not have an internal AI team

The operational expertise gap is the most cited reason enterprises stay on APIs longer than their economics justify. Setting up GPU servers, managing model weights, handling inference optimisation, and maintaining uptime is genuinely non-trivial if you are doing it for the first time.

This is precisely the gap that specialist AI infrastructure partners exist to close. Rather than hiring a full ML engineering team before you have validated the use case, the right starting point is a scoped engagement: an infrastructure partner deploys your first private model, documents the stack, trains your team, and hands over a system you own and operate independently.

The economics shift permanently once that first deployment is live. Every subsequent inference is free from vendor lock-in, priced at marginal electricity and hardware amortisation, and fully under your control.

The AI cost crisis is not a reason to stop investing in AI. It is a reason to invest differently — in infrastructure you own rather than services you rent, in use cases with hard ROI rather than pilots that look impressive in board decks, and in a roadmap that treats today's API spend as a bridge, not a destination.

Ready to stop renting compute?

Adelphos designs and deploys private AI infrastructure for enterprises — from your first fine-tuned model to a full sovereign stack. We handle the architecture, deployment, and handover so your team owns the outcome.

Talk to us about your stack →

Stay ahead of the curve

Get our next deep-dive in your inbox