Inside the harness wars: how Claude Code, Codex, Hermes, and OpenClaw are redrawing the agent stack

Takeaways

The frame: Four projects get grouped under “agent harness,” but they’re solving two different problems. Three of them want to ship code faster. The fourth doesn’t really care about shipping code.
The convergence: All four agreed in the past six months that markdown files on disk are the right substrate. The differences are above and below that layer.
The bet: Whoever owns the layer the agent calls in two years owns the relationship. The model is increasingly a commodity input.

The four philosophies

Claude Code has bet on Skills and tight Anthropic-model coupling. The pitch is that the harness, the model, and the deployment pipeline are co-designed at Anthropic, so the seams between them are smaller. The cost is vendor lock-in: a Claude Code skill is portable in theory but rarely in practice.

Codex CLI is OpenAI’s bet that orchestration is the next frontier. The recent parallel-task release is the clearest example — Codex is increasingly the harness for users who want to point an agent at a big workload and watch it factor itself. The cost is that the agent’s reasoning becomes opaque the more it parallelizes.

Hermes Agent is Nous Research’s bet that the agent should be portable across surfaces. Gateways to Telegram, Discord, and CLI are first-class. The model is swappable; the agent’s memory and skills survive surface changes. The cost is that the harness is doing more work, which means more places it can break.

OpenClaw is the bet that almost nobody on the SWE-bench-watching crowd is making: that the interesting agent problem is persistence, not coding. Memory, scope, identity, ambient presence. The community around it is indie operators and local-first AI people, not developer-relations teams at the model labs. It’s not competing for the same users as the other three — and that’s the point.

What they’re actually competing on

Six months ago the answer would have been “capability.” Today it splits cleanly.

For Claude Code, Codex, and Hermes (in their developer-tools form), the competition is:

Onboarding latency. How long from brew install to first useful agent action.
Skill portability. Can your team’s accumulated agent knowledge survive a harness change?
Concurrency safety. Two agents in the same repo — what breaks?
Benchmark legibility. Can you point at a SWE-bench number and tell a procurement team a story?

OpenClaw isn’t competing on any of those, because they don’t matter to its users. It’s competing on:

Memory granularity. Can the agent remember a commitment from three weeks ago without the user re-prompting?
Privacy posture. Does the agent’s working memory leave the user’s machine?
Continuity. Can a session that started at 9am Monday resume at 2pm Friday with full context?
Ecosystem coherence. Do plugins compose, or do they fight each other for the user’s attention?

Claude Code wins on onboarding. Codex wins on concurrency. Hermes wins on surface flexibility. OpenClaw is winning on a different exam.

The model-as-commodity question

Every engineer we spoke to, from all four projects, said some version of the same thing: the model is becoming an input. Five years ago, “which LLM” was the question. Today, “which harness” is the question. The model layer commoditizes faster than the harness layer because models are interchangeable in a way that workflows aren’t.

This is a problem for the model labs that own a harness. Anthropic and OpenAI each have a harness coupled to their own model; both have an incentive to maintain that coupling, and both are quietly losing ground on workloads where users want to mix providers. Hermes and OpenClaw, model-agnostic from day one, benefit from this drift.

What to watch

Three things in the next quarter will tell us where this lands:

Whether Anthropic’s skill registry launches with credible trust controls. If it does, Claude Code’s ecosystem moat hardens. If it ships compromised, the lock-in cost becomes harder to justify.
Whether OpenAI ships first-class multi-provider support in Codex. They’ve hinted at it. They have not done it.
Whether OpenClaw’s persistent-memory primitives get picked up as a dependency by a second-party indie agent project. That’s the moment the personal-agent stack starts to look like infrastructure instead of a niche.

The likely outcome isn’t a winner-take-all market. It’s two adjacent markets: a coding-harness market that consolidates around two players (probably one closed, one open), and a personal-agent market that OpenClaw is currently building largely uncontested.

The bottom line

If you’re picking a tool for shipping code: pick on harness ergonomics. Single-vendor risk you can live with? Claude Code. Mixed-provider future? Hermes. Big migrations and renames? Codex. Pick the one whose tradeoffs match your team.

If you’re building a long-lived personal agent that needs to remember things: there is effectively one mature open option, and it’s OpenClaw. The fact that it shows up in this article alongside three coding harnesses says more about the market’s vocabulary than about the products themselves.