Key Takeaways
- Personal tokenmaxxing is not corporate tokenmaxxing. — The news-cycle debate is about Meta/OpenAI/Shopify leaderboards tracking employee token burn. That is gameable theater. Stacking subs at a desk of one is pure cost optimization — a completely different activity.
- Two Claude Max 20x accounts beats one Claude Max plus API overflow. — Once you burn through one Max subscription daily, a second one is cheaper than paying API rates for the overflow. Same model, predictable cost, no surprises at month-end.
- z.ai and Kimi fill the cheap-batch gap. — Long-context summarization, batch classification, and low-stakes generation do not need Claude Opus. $5-15/month gets you enough capacity that the Claude bill stops being your bottleneck for the easy stuff.
- OpenRouter is the escape hatch for everything else. — Don't buy a seventh subscription. Keep OpenRouter on standby with $50 loaded; route anything unusual through it and only commit when a pattern emerges over 30+ days.
- Notion as the tracking layer — but only the minimum viable version. — A three-column table (tool, monthly cost, primary use case) plus a weekly five-minute cost review. Do not build a dashboard for this. The bookkeeping should not eat the savings.
Two Tokenmaxxings, One Word
Silicon Valley has spent the last week arguing about tokenmaxxing. TechCrunch ran Reid Hoffman weighing in. WSJ covered Meta shutting down an internal leaderboard that ranked employees by their AI token consumption. Axios reported Salesforce pitching outcome-based metrics as the alternative. Forbes asked if the whole thing is a cult. Inc called it the controversial new AI productivity metric. The debate is real and, at the corporate level, mostly deserved — ranking people on tokens consumed is a classic category error. Engineers at Meta reportedly built bots to burn tokens in loops. That kind of metric gaming is the oldest story in software management.
But there is another tokenmaxxing that the news cycle is flattening. Before the term became Silicon Valley shorthand for gameable corporate theater, it had a quieter meaning in developer communities: actively optimizing how you extract value from multiple fixed-price AI subscriptions. That's what I actually do. I tokenmax at a desk of one. No leaderboard is ranking me. Nothing I do inflates anyone's KPI. And it saves me roughly a thousand dollars a month compared to routing the same work through pay-per-call APIs.
This post is the stack. What I pay for, what each subscription does, how I route work between them, and what I would drop if I had to cut half the budget. For the broader definition of tokenmaxxing and the corporate debate, see the tokenmaxxing explainer I wrote on ctaio.dev. This page is the personal version.
The Stack: What I Actually Pay For
As of April 2026, my monthly AI coding spend is $547 spread across six subscriptions. Here is the full inventory with what each does that the others don't.
| Subscription | Monthly Cost | Primary Use Case | What It Replaces |
|---|---|---|---|
| Claude Max 20x (Primary) | $200 | Main agentic coding, multi-file refactors, architecture work | ~$450/mo in Sonnet API usage |
| Claude Max 20x (Secondary) | $200 | Parallel agents, batch tasks, overflow when primary hits rate limits | API overflow that would cost $250-400/mo |
| GitHub Copilot Pro | $10 | Inline completion in VS Code, pair programming | Flow-state interruptions; no direct API analog |
| z.ai (GLM subscription) | $15 | Long-context summarization, batch classification, cheap generation | ~$80-120/mo if routed through OpenAI API |
| Kimi (Moonshot) | $12 | 128K+ context document work, Chinese-market content | Claude Opus calls that would cost 3-5x more |
| OpenRouter (pay-as-you-go, avg) | $60 | Escape hatch: experimental models, one-off jobs, overflow routing | Having to buy yet another subscription |
| fal.ai (image + media credits) | $50 | OG images, hero illustrations, occasional video | Stock photos I'd pay for anyway |
| Total | $547 | ~$1,400-$2,000 equivalent API spend |
The $50 on fal.ai is technically media, not LLM, but it lives in the same spreadsheet and serves the same purpose: a fixed-price ceiling on a variable-cost activity. The total excludes ChatGPT Plus, which I dropped when Claude Max 20x started replacing it for every task I had been paying OpenAI for. That cancellation is its own small case study in why you should review the stack monthly.
The Math: Why Stacking Beats Single-Subscription
The simplest argument against stacking is that six subscriptions cost more than one. On paper, yes — $547 is more than $200. But that comparison ignores the thing the whole exercise is built around: rate limits and per-task pricing efficiency.
A single Claude Max 20x account covers roughly 6-8 hours of heavy agentic coding before you start hitting meaningful rate limits. I work 8-10 hours daily, 5-6 days a week. Without the second Max account, every overflow hour would route to Sonnet 4.6 API at $3 input / $15 output per million tokens. A typical hour of agentic coding consumes 60K-120K tokens. At 2 hours of daily overflow, that's 120K-240K tokens × 22 working days = 2.6M-5.3M tokens a month. Even at an 80/20 input/output split, that's $250-$400/month in API overflow — more than the $200 I pay for the second subscription.
z.ai and Kimi work on the same logic from the other direction. Batch classification, document summarization, and cheap generation do not need Claude. They need a model with a long context window and a low per-token price. A $15/month z.ai subscription handles enough of my daily batch work that equivalent usage through the OpenAI API would cost $80-120/month. The same math applies to Kimi for long-context document tasks.
The remaining subscriptions are qualitatively different. Copilot at $10/month is the cheapest flow-state tool in computing — inline completion at <50ms latency, which no API-based setup can replicate without infrastructure work. OpenRouter at ~$60/month variable is the optionality layer: I do not need a seventh subscription if any given month produces one weird task that wants a specific model. fal.ai at $50/month is bounded media spend.
Notion as the Management Layer — Kept Small on Purpose
Every article about AI subscription stacking eventually pitches an elaborate tracking system. Dashboards, cost-per-token instrumentation, per-project attribution. I tried that. It takes longer to maintain than the subscriptions save, and it produces the wrong kind of data — precise numbers about last month when what I needed was a rough yes/no about whether each tool is still earning its keep.
What I actually use is a Notion table with four columns: tool name, monthly cost, primary use case, and a rolling weekly checkbox. Every Friday I spend five minutes marking which subscriptions got used that week. If a subscription goes two consecutive weeks unchecked, it gets flagged. Three weeks, I cancel. The friction is deliberate — if I made it easier to track, I would spend more time tracking and less time deciding. The whole point is that subscriptions compound; the five-minute review keeps compounding from turning into subscription bloat.
The secondary purpose of the Notion table is that it forces me to name the primary use case per tool. If I cannot articulate why Kimi exists in the stack in one sentence, Kimi probably shouldn't be in the stack. That framing alone has killed three subscriptions in the last year.
Honest Regrets: What I'd Drop, What I Got Wrong
The honest section. Kimi is on thin ice. It overlaps with z.ai for most of what I use it for, and the Chinese-market content use case that originally justified it is smaller than I projected. If I had to cut one subscription today, it would be Kimi, and z.ai would absorb its workload without a noticeable drop in quality.
The second Claude Max is the subscription I most often second-guess. The math works at my current usage, but the math would break if I dropped below six hours of daily Claude Code work — which could happen if I shift to more writing or more meetings in a given month. The right rule would be to cancel the secondary account the month I anticipate the shift, not to pay for two Max plans out of habit. I have not always gotten that timing right.
The biggest mistake in the history of this stack was keeping ChatGPT Plus for four months after Claude Max 20x made it redundant. I paid $80 for nothing because canceling felt like a bigger decision than it was. The lesson, which I now apply more aggressively, is that subscription cancellations should be easier than subscription additions — so if I am ambivalent about a tool, the default should be to drop it and re-add it if I miss it.
What the News Cycle Misses
The corporate tokenmaxxing debate is legitimate on its own terms — ranking employees on tokens consumed is a bad metric, and the critics (Jon Chu, the Salesforce team, most of the engineers quoted anonymously in the coverage) are right about the specific claim. But the debate keeps framing all token maximization as theater, which misses the narrower practice it borrowed its name from.
The personal version works precisely because there is no leaderboard. The incentive is pure cost optimization: stack subscriptions to the point where the marginal subscription costs less than the equivalent API usage it replaces, then stop. There is nothing to game because no one is watching. The fact that the word "tokenmaxxing" got pulled into a corporate management controversy is a branding accident, not a reflection of the practice.
The practical lesson is almost the opposite of the one the corporate debate is drawing. At the corporate level, stop tracking tokens and start tracking outcomes. At the personal level, keep tracking your subscriptions — but track them against your own costs, not against anyone else's metric. That is the tokenmaxxing that works.
FAQ
What is LLM subscription stacking?
LLM subscription stacking is paying for multiple AI model subscriptions simultaneously and routing each task to whichever tool handles it best. The rationale is economic: each subscription has a fixed cost and a rate limit, and for specific task categories (agentic coding, inline completion, long-context summarization, media generation) a dedicated subscription is cheaper than routing everything through a single tier or through pay-per-call API access. My current stack is six subscriptions at roughly $547/month total, which replaces what would be $1,400-$2,000/month in equivalent API usage.
Is paying for two Claude Max subscriptions worth it?
Only if you reliably burn through one Max 20x plan every day. For me, yes — a second Max 20x account at $200/month is cheaper than paying API rates for the overflow work that one account cannot absorb. I rotate between the two accounts during the day (primary for main work, secondary for parallel agents and batch tasks). The economics flip below roughly six hours of daily Claude Code usage, at which point a single Max plan plus occasional API calls is more cost-effective.
Why use z.ai or Kimi on top of Claude?
Claude Max is expensive per token and rate-limited during US peak hours. z.ai (GLM models) and Kimi (Moonshot) are cheap, have long context windows (128K+), and handle batch classification, long-document summarization, and low-stakes generation perfectly well without touching the Claude quota. I pay $5-15/month each. The pattern is: Claude for reasoning-heavy work, z.ai or Kimi for volume and cheap-batch, OpenRouter for anything that does not fit either.
When should I use OpenRouter instead of GitHub Copilot?
OpenRouter and Copilot solve different problems. Copilot is inline IDE completion — it's always on, low-latency, and priced at $10/month for unlimited use. OpenRouter is pay-per-call routing to any model you want, useful for experiments, overflow from your main subscriptions, or accessing models not covered by your other tools. Use both: Copilot for everyday typing, OpenRouter when you need a specific model for a specific job without adding another fixed subscription.
How do you track what each subscription actually delivers?
A simple Notion table: tool, monthly cost, primary use case, and a weekly checkbox for "used this week." Every Friday I spend five minutes reviewing which subscriptions got used and which did not. A subscription that goes two consecutive weeks unused gets a kill candidate tag; three weeks and I cancel. The weekly review takes less time than deciding whether to cancel would if I waited for the annual renewal to surprise me.
What is the difference between personal tokenmaxxing and the corporate tokenmaxxing debate?
They are different activities that share a word. Corporate tokenmaxxing, the subject of recent TechCrunch and WSJ coverage, is about companies (Meta, OpenAI, Shopify) ranking employees on internal leaderboards by token consumption — a gameable productivity metric that critics call theater. Personal tokenmaxxing is a developer stacking multiple cheap AI subscriptions to maximize value for a fixed monthly budget. The corporate version measures input and creates bad incentives; the personal version optimizes cost against actual work done. The definition article on ctaio.dev walks through both meanings.
What would you drop if you had to cut costs?
In order: Kimi first (redundant with z.ai for my workload), then the second Claude Max if I went below six hours of daily Claude usage, then OpenRouter last (it costs almost nothing when unused). I would never drop the primary Claude Max, GitHub Copilot, or fal.ai — each is doing something no other tool in my stack does. If I had to survive on two subscriptions, it would be Claude Max 20x plus Copilot Pro, routing anything exotic through OpenRouter on demand.
Related reading: For the broader definition and the corporate tokenmaxxing debate (Reid Hoffman, Meta, Salesforce), see the tokenmaxxing explainer on ctaio.dev. For the underlying economics of agentic coding, the Claude Code 90-day cost breakdown is the long version of the math in this post. For the broader tool landscape, Claude Code vs Cursor vs Windsurf.
Ready to Transform Your AI Strategy?
Get personalized guidance from someone who's led AI initiatives at Adidas, Sweetgreen, and 50+ Fortune 500 projects.