Every six months a one-paragraph prompt trick goes viral and eats a week of developer Twitter. This month’s is Caveman — a Claude Code skill that makes the agent drop articles, filler, and pleasantries and answer in grunts. The headline claim is a 75% reduction in output tokens with no loss of technical accuracy. The viral version of the pitch is funnier than that: why use many token when few do trick.
The reason this one is worth writing a primer about — rather than ignoring until it dies down — is that the “joke skill” is actually a real tool on top of a non-obvious finding about how modern coding agents behave when you constrain their output length. That finding is the part worth keeping after the meme fades.
This primer covers what the skill actually does, the surprising research angle underneath it, and how I’d decide whether to install it on a given project.
What Caveman is
Caveman is a Claude Code skill (with ports to Codex, Cursor, Gemini CLI, Windsurf, Copilot, and most agents supported by npx skills) authored by Julius Brussee. Install it once and your agent starts answering like this:
Normal Claude: “The reason your React component is re-rendering is likely because you’re creating a new object reference on each render cycle. When you pass an inline object as a prop, React’s shallow comparison sees it as a different object every time, which triggers a re-render. I’d recommend using useMemo to memoize the object.”
Caveman Claude: “New object ref each render. Inline object prop = new ref = re-render. Wrap in
useMemo.”
Same answer. Roughly a quarter of the tokens. No code quality loss — and this is the claim to hold onto — because the skill explicitly preserves code blocks, commit messages, PR bodies, and any output where structure carries meaning. Only natural-language prose gets squeezed.
The skill ships with four intensity levels:
- Lite — grammar intact, filler gone. Professional, just no fluff.
- Full — the default. Drop articles, fragments OK, full grunt.
- Ultra — telegraphic. Abbreviate aggressively.
- 文言文 (Wenyan) — Classical Chinese literary compression. This started as a joke in the repo and kept getting measured at even higher compression ratios than English ultra-grunt because Classical Chinese really is, objectively, one of the most token-efficient written languages ever invented. The bit that’s a joke is shipping it in a CLI tool.
You switch modes mid-session with /caveman lite, /caveman ultra, /caveman wenyan, etc. The chosen level sticks until the session ends or you toggle back with "normal mode".
Why this primer is necessary right now
The skill itself is, on its own, a prompt engineering trick and a SessionStart hook. If that were all there was to it, you’d read the GitHub README in two minutes and be done.
What makes it worth a primer is the collision of three things happening in April 2026:
-
The skill itself is viral enough that teammates are going to ask about it. The repo went from zero to over 8k stars in its first two weeks. If you manage a team that uses Claude Code, this is coming up in standups this week.
-
The underlying research is interesting in a way the meme obscures. Julius cites a March 2026 paper, “Brevity Constraints Reverse Performance Hierarchies in Language Models,” which found that constraining frontier models to brief responses improved accuracy by 26 percentage points on certain benchmarks — and, more strikingly, completely reversed which models were best at which tasks. That is not a small finding. If it replicates, it changes how we should think about prompting for accuracy, not just cost.
-
This is the first time I’ve seen “Claude Code skill” used as a distribution mechanism for a prompt pattern at meaningful scale. Skills landed as an architectural concept last fall; this is the moment one of them crossed over into general developer consciousness. The format matters as much as the content.
Any one of those three would justify taking the skill seriously. Together they make it mandatory reading for anyone using Claude Code daily.
The non-obvious part: brevity can improve accuracy
The intuitive mental model of an LLM says: more tokens the model generates = more reasoning it can do = better answer. Extended thinking, scratchpad prompting, and chain-of-thought prompting all sit on that assumption.
Caveman’s claim is that for a large class of coding tasks, that model is wrong — or at least incomplete. The reasoning happens in the thinking phase (Claude’s extended-thinking budget, which Caveman explicitly leaves alone). What Caveman compresses is the answer phase — the part where the model explains its reasoning back to you. The paper’s claim, reframed: making the answer shorter doesn’t shrink the reasoning; it forces the model to commit to a position instead of hedging across multiple possibilities.
Think about how verbose LLM output typically sounds:
“The issue you’re experiencing is most likely caused by…” (hedge) “One possible approach would be to…” (hedge) “I’d recommend…” (hedge) “However, you could also consider…” (hedge-and-diffuse)
Each of those hedges is the model giving itself an escape hatch. Caveman-mode output can’t hedge because hedges are filler and filler is banned. The model has to pick an answer. If the model is right about the answer, you get the same information faster. If the model is wrong about the answer, you notice faster — because the single committed answer is visible, whereas the hedged version could be read as right by a charitable reader.
That second case is the subtler value. Caveman makes model confidence legible. You can’t mistake a hedge for a confident answer when hedges aren’t allowed.
Whether the paper’s 26-point accuracy improvement replicates on your workload is an empirical question — run your own evals on your actual tasks before believing any number. But the mechanism is plausible enough that the prior shifts. “More tokens ≠ more correct” is not a universal claim, but it’s a useful one to sit with.
Install in two minutes
The canonical path for Claude Code:
claude plugin marketplace add JuliusBrussee/caveman
claude plugin install caveman@caveman
Restart Claude Code. You’ll see a [CAVEMAN] badge in the statusline (or the plugin will offer to add one if you don’t have a custom statusline already). Caveman mode is now active on every session start.
For the other editors:
| Agent | Install |
|---|---|
| Codex | Clone repo → /plugins → search “Caveman” → Install. Use $caveman to trigger. |
| Gemini CLI | gemini extensions install https://github.com/JuliusBrussee/caveman |
| Cursor | npx skills add JuliusBrussee/caveman -a cursor |
| Windsurf | npx skills add JuliusBrussee/caveman -a windsurf |
| Copilot | npx skills add JuliusBrussee/caveman -a github-copilot |
| Anything else | npx skills add JuliusBrussee/caveman |
One caveat worth reading the README for: npx skills add installs the skill file, but not the always-on rule/instruction file for the agent. That means on Cursor/Windsurf/Cline/Copilot you have to trigger caveman mode each session (/caveman or “talk like caveman”), whereas Claude Code and Gemini CLI auto-activate via hooks. If you want always-on behavior on the others, the README has a one-paragraph rule snippet to paste into your agent’s system prompt.
Turning it off is "stop caveman" or "normal mode".
The skills bundle — this is where it gets interesting
The core skill is the crowd-pleaser, but the bundle ships four auxiliary skills that are, in my opinion, the more durable part of the package.
/caveman-commit
Generates terse, Conventional Commits-style commit messages with a ≤50-character subject and no throat-clearing in the body. The output looks like:
fix(auth): use <= for token expiry check
Off-by-one let expired tokens through. Covers the equality edge.
Not revolutionary. But it’s the best implementation I’ve seen of the why-over-what commit style, because caveman-mode doesn’t let the model drift into explaining what the diff already shows.
/caveman-review
One-line PR comments in the shape L42: 🔴 bug: user null. Add guard. Line, severity, diagnosis, fix. Four fields, one line. No throat-clearing, no “consider whether…”, no “you might want to…”.
This is genuinely better than vanilla Claude PR reviews because the format forces a diagnosis. Vanilla PR-review output often reads as “here are five things that caught my eye” without committing to which ones are real bugs versus style preferences. Caveman-review requires a severity classification per comment, which is the actually-useful part.
/caveman-compress
This is the one most people miss, and it’s the most technically interesting.
Caveman-the-skill makes Claude speak fewer tokens. caveman-compress makes Claude read fewer tokens. It rewrites your CLAUDE.md (and any other memory file you point it at) into caveman-speak, saves a .original.md backup, and the next time Claude loads your project it reads the compressed version:
/caveman:compress CLAUDE.md
CLAUDE.md ← compressed (Claude reads this every session)
CLAUDE.original.md ← human-readable backup (you edit this)
Reported compression ratios across five real memory files: 37–60%, averaging 46%. The tool passes through code blocks, URLs, file paths, commands, headings, dates, and version numbers untouched — only prose compresses.
Why this matters: CLAUDE.md loads on every session start and counts against your input tokens every time. A 2,000-token CLAUDE.md at 10 sessions/day is 600k tokens/month you’re paying input-price on. Cut it 46% and you’ve made a real dent, and unlike output-compression, input-compression compounds with prompt caching — smaller cached prefix = smaller cold-cache miss.
I’d argue this is actually the most useful skill in the bundle. Output compression is a readability win with a cost bonus. Input compression is a direct cost win that pays every session forever.
/caveman-help
Quick-reference card of all modes and commands. Exists because the bundle has enough surface area that you’ll forget what’s available. Useful.
The Caveman ecosystem
Julius ships Caveman as one of three related tools:
- caveman — output compression (this primer).
- cavemem — cross-agent persistent memory via MCP, local-first SQLite, compressed. Positions itself as “the memory layer that works across Claude Code, Cursor, and Codex.”
- cavekit — a spec-driven autonomous build loop that orchestrates natural-language specs → parallel build → verification.
These compose: cavekit orchestrates the build, caveman compresses what the agent says, cavemem compresses what the agent remembers. They also all stand alone. I’d start with caveman + caveman-compress and only pull in cavemem if you’re seeing real pain with cross-agent context (which most single-editor users won’t).
The naming convention is self-aware in a way that’s hard not to respect.
Where Caveman doesn’t help, and where it actively hurts
The skill is targeted. A few places to leave it off:
Teaching and pair-programming with juniors. Caveman outputs assume the reader can decode "New object ref each render" into “the component re-renders because each render creates a new object reference, which fails React’s shallow prop comparison.” A senior engineer already holds that expansion in their head. A junior doesn’t. For pair-programming or code walkthroughs aimed at someone ramping up, verbose Claude is pedagogically better.
Audit trails and compliance contexts. If your agent’s output is going to be read by a compliance reviewer, an incident post-mortem author, or anyone else who needs the reasoning captured in prose, don’t compress the reasoning away. Caveman makes decisions legible to experts and illegible to bureaucratic processes — sometimes you need the second thing.
Anything where the prose is the artifact. Writing docs, writing user-facing release notes, writing any kind of long-form English? Turn caveman off first. The skill explicitly tells itself to leave code alone but it will happily compress your blog post into fragments.
Debugging a subtle issue where you need the model to think out loud. One real trade-off: caveman-style output forces commitment, which is great when the model is right and bad when it’s wrong and you need to see the reasoning chain to spot where it’s wrong. For hard debugging sessions I often want verbose Claude explicitly so I can follow its diagnostic chain and catch its mistake.
The rule of thumb I’ve settled on: caveman-on for execution, caveman-off for exploration. When I know what I’m trying to do and I want Claude to do it fast, caveman. When I’m investigating something I don’t fully understand yet, verbose.
Pitfalls once you’ve installed it
A few things to watch for, gathered from the first couple weeks of general use:
Statusline badge invisible in some terminals. The [CAVEMAN] badge relies on Claude Code’s statusline, which not every terminal renders with color. Check that caveman is actually active — if you’re not sure, ask Claude a sentence and look at the response length. If you get a four-paragraph answer, the skill isn’t loaded.
Reverting silently after many turns. This was a reported issue in the first release and is mostly fixed by the current always-on prompt snippet, but worth watching. If you notice the model drifting back to verbose output after a long session, reassert with /caveman or "talk like caveman". The current snippet includes "ACTIVE EVERY RESPONSE. No revert after many turns." which helps.
Thinking tokens unchanged. Caveman compresses output. It does not compress Claude’s extended thinking (the internal reasoning). You still pay full thinking-token price. That’s actually the design — you want the model thinking at full depth, just not lecturing about it afterward. But it means cost savings are smaller than “75% fewer tokens” sounds, because thinking is often where the token spend is.
Code-review tooling mismatch. If your team uses automated PR comment linting or style-checks comment format, one-line caveman-review comments may not match the expected shape (“expected 2 sentences minimum”). Check your tooling before rolling out team-wide.
Agents without hooks. On Cursor, Windsurf, Cline, Copilot — the skill doesn’t auto-activate, so your experience will feel worse than a Claude Code user’s unless you paste the always-on rule into your agent config. Read the install table carefully.
How to decide
My recommendation for someone reading this deciding whether to install:
-
If you use Claude Code daily: install it today. The output-compression meme is fun;
caveman-compresson yourCLAUDE.mdis the actually-useful thing and takes 30 seconds. Toggle caveman mode off with"normal mode"when you hit a task where you want verbose thinking. -
If you use Cursor/Windsurf/Codex daily: install it, but add the always-on snippet to your agent’s rules file. Otherwise you’ll keep forgetting to trigger it.
-
If you manage a team using Claude Code: try it yourself for a week before rolling out. The right default at team level is not obvious — you may or may not want every engineer’s session to be caveman-mode by default. At minimum, standardize on either everyone has it or nobody has it, because mixed-mode output in shared session transcripts is confusing.
-
If you’re curious about the underlying research claim: install it, run your own evals on your own task distribution, and report back. The paper’s 26-point accuracy claim is the kind of thing that should either replicate broadly or quietly disappear within six months. Either outcome is informative; neither is yet settled.
The five-minute path
If you’ve read this far and you’re convinced:
claude plugin marketplace add JuliusBrussee/caveman
claude plugin install caveman@caveman
Restart Claude Code. Then:
/caveman:compress CLAUDE.md
Look at your new CLAUDE.md versus CLAUDE.original.md — the diff is instructive on its own. That’s the install.
From there, use Claude normally. Ask it a question. Notice the answer is shorter. Notice that if the answer is wrong, it’s wrong visibly, and if it’s right, you get to the fix faster. That’s the value proposition in two sessions’ worth of usage.
Where to go next
- The caveman repo has the full benchmark harness (
evals/) that produced the 22–87% range numbers. Reproducing them on your own workload is the honest way to evaluate the skill. - The prompt caching primer pairs directly with
caveman-compress— input-token compression and prompt caching are multiplicative, not additive. - If you’re building your own Claude Code skill, Caveman is one of the cleanest small-scope examples in the wild to study. It’s essentially a one-page prompt plus a session-start hook, and the whole repo is readable in twenty minutes.
The short version: the meme is fun, the 75% number is mostly real, and the non-obvious value is in caveman-compress and the committed-answer effect, not in making Claude talk like a caveman. The talking-like-a-caveman part is the distribution strategy. It worked.