Blog
/
Trinity-Large-Thinking: Scaling an Open Source Frontier Agent

Trinity-Large-Thinking: Scaling an Open Source Frontier Agent

Trinity-Large-Thinking is live. A frontier open reasoning model for complex, long-horizon agents and multi-turn tool calling released under Apache 2.0.

Trinity Large Thinking Benchmarks

Today we are releasing Trinity-Large-Thinking on our API and the weights on Hugging Face under the Apache 2.0 license.

Nine months ago, we made a decision that changed the shape of this company. We decided that if we cared about serious American open models, about models developers and enterprises could actually own, then we needed to build them ourselves.

That decision gave us Trinity.

First came the smaller models, 4.5B, Nano and Mini. Then came Preview at the end of January, our first public look at Trinity Large.

Today comes the official release: Trinity-Large-Thinking, our reasoning model built to close the gaps Preview left open.

On many axes, it is the strongest open model ever released outside of China.

It is the result of the last two months spent improving and scaling our SFT and RL pipeline so it could meet the size and capability of the Trinity-Large base model. Preview was an instruct model, as the name would suggest, the new checkpoint uses “thinking” prior to responding, like Trinity-Mini does. This enables stronger multi-turn tool calling, better context coherence, cleaner instruction following, and more stable behavior across long-running agent loops.

The market has been asking for this. Trinity-Large-Preview took off faster than we expected.

We launched it at the end of January as a light instruct post-train, with the expectation that people would test it, break it, and help show us where it wanted to go next. That is exactly what happened. The model later crossed just 3.37 trillion tokens served on OpenRouter in its first 2 months, and OpenRouter’s OpenClaw collection has Trinity Large Preview as the #1 most used open model in the U.S. and #4 globally.

And they hit our GPUs like a freight train, we couldn’t have asked for a better way to stress test our serving platform.

And because we know our Preview model had a special place in many users’ hearts, we’re keeping it free on OpenRouter, albeit on less hardware. We’ll have more updates on the long-term plans for that model in the coming weeks.

When approaching Large-Thinking, given the capital we had and the timeline in front of us, we did not think we could become the best open coding model in the world overnight. We also did not think that was the only problem worth solving. What we thought we could do, and what Preview started to prove, was build a model that was unusually good for the kinds of agents developers were starting to run every day, 24/7.

That meant focusing on the things that make agents feel real in practice: staying coherent across turns, using tools without getting sloppy, following instructions under constraint, and keeping quality high without making the economics absurd.

Trinity-Large-Thinking is the official release of that bet.

It is available through our API. The weights are available on Hugging Face under Apache 2.0. We are releasing it this way because we continue to believe that permissive American open weights are necessary for the US to fully claim that we’re “leading” in AI.

Developers and Enterprises need models they can inspect, post-train, host, distill, and own.

Trinity-Large-Thinking clears an important bar on both capability and usability.

It scores #2 on PinchBench, a benchmark from Kilo measuring model capability on tasks relevant to agents like OpenClaw, just behind Opus-4.6 while landing at $0.90 per million output tokens on our API, roughly 96% cheaper.

It is far better than Preview at multi-turn tool use, context coherence, and instruction following across long-horizon agent runs. Our goal has been reliable, cheap, high-quality agents for developers, startups, and enterprises. We feel we’ve hit each of these.

The last nine months have been a real sprint to reach the frontier.

We had to prove we could stand up the full pipeline, make sparsity behave, stabilize training, push post-training harder, and do it all without the luxury of endless capital reserves.

That constraint shaped the company in meaningful ways: it forced us to be precise about what mattered, pick our battles, and build a training culture that takes efficiency seriously from pretraining through RL.

We have reached a level of competitiveness, and a price per token, that we feel genuinely good about. That means the question changes. It is no longer only whether we can get to the frontier. It is whether we can take what worked here, understand it deeply, and use it to train the best open-weight models in the world.

It is also the harder question, but one we can’t wait to find the answer to.

Now, much of the work that went into Large will now flow back down the stack.

We will bring the pretraining and post-training lessons from Trinity-Large into our Mini and Nano models. One of the bitter lessons in training great small models is that you usually need to train a really good big one first. Then you distill. We can’t wait to get to work on Trinity-2-Nano and Mini and put that lesson to good use.

Large will keep getting better too. We’re not done with this generation Trinity-Large, but we are at the point where the project becomes even more ambitious.

A personal note

I am proud of this team.

Getting here took difficult technical work, hard calls, and more than a few moments where the easy thing would have been to lower the ambition. Nobody did that. They kept pushing.

I am also grateful to the developers, partners, and early users who put Trinity-Large-Preview into real systems while it was still young enough to embarrass itself. That feedback mattered. It helped shape the model we are releasing today.

Trinity-Large-Thinking is live now. Sign up via our platform or openrouter.

We are building these models so you can own them.