Trinity Models

Reliable intelligence you can run anywhere.

Open weights and a production API for multi-turn conversations, tool use, and structured outputs. Pick your size based on where you deploy. Capabilities stay consistent.

Try in Playground

Download Weights

Decorative geometric pattern with overlapping rectangular shapes in light gray on white background

Decorative geometric pattern with overlapping rectangular shapes in light gray on white background

Trinity Nano〔6B, 1B active〕

Trinity Mini〔26B, 3B active〕

Trinity〔Coming soon〕

Technical Overview

Deploy Your Way

Models

All Trinity variants share the same skill profile. Choose the footprint that fits your infrastructure.

Trinity Nano〔6B, 1B active〕

Edge, embedded, and privacy-critical

Runs fully local on consumer GPUs, edge servers, and mobile-class devices. Tuned for offline operation and latency-sensitive voice or UI loops.

Deploy on: on-device, edge servers, mobile, kiosks

Active parameters: 1B per token

Context window: 128K tokens

Variants: Standard and Reasoning

Trinity Mini〔26B, 3B active〕

Cloud and on-prem production

Serve customer-facing apps, agent backends, and high-throughput services in your cloud or VPC.

Deploy on: AWS, GCP, Azure, on-prem (vLLM, SGLang, llama.cpp)

Active parameters: 3B per token

Context window: 128K tokens

Variants: Standard and Reasoning

Faster and lower cost. Best for routing, retrieval, summarization, classification, extraction, and straightforward tool calls. Same APIs, schemas, prompts. Switch per request.

Enhanced stepwise planning and long‑horizon tool use. Best for multi‑step workflows, complex coding, and analysis. Slightly higher latency and cost per token.

Capabilities

Agent reliability: Accurate function selection, valid parameters, schema-true JSON, graceful recovery when tools fail.
Coherent multi-turn conversation: holds goals and constraints over long sessions; follows up naturally without re-explaining context.
Structured outputs: JSON schema adherence; native function calling and tool orchestration.
Same skills across sizes: Move workloads between edge and cloud without rebuilding prompts or playbooks.
Super efficient attention: Reduced cost of running at long contexts compared to other models.
Strong context utilization: Makes full use of large input docs for more relevant, grounded responses.

Technical Overview

Sparse mixture of experts with highly efficient attention for lower latency and lower cost.

Trinity Mini: 26B total / 3B active per token
Trinity Nano: 6B total / 1B active per token

Curated, high‑quality data across diverse domains with strict filtering and classification pipelines.
Heavy synthetic augmentation to cover edge cases, tool calling, schema adherence, error recovery, preference following, and voice‑friendly styles.
Evaluation focuses on tool reliability, long‑turn coherence, and structured output accuracy.

Sparse mixture of experts with highly efficient attention for lower latency and lower cost.

128K token context window
Structured outputs with JSON schema adherence
Native function calling and tool orchestration

Read the Technical Docs

Deploy Your Way

Start in minutes with our managed endpoint. Free during beta.

Run anywhere with full control. Available on Hugging Face. Works with vLLM, TGI, Ollama, llama.cpp, ExLlamaV2.

Hardware Requirements

Dedicated support and deployment.

Quickstart

1

Get an API

Get an API key or download weights

2

Pick model

Pick Mini for cloud or Nano for edge

3

Choose workload

Choose Reasoning or Standard per workload

4

Plug in your tools

5

Ship to production

View Quick Starts

FAQs

Yes. Both variants accept the same prompts and tools.

Provide a schema and the model will produce schema‑true outputs. See the structured outputs guide.

Tell the model exactly how brief you want it to be. See the examples above.

Build with Trinity today

Free API and open weights are available now.

Try in Playground

Download Weights

Abstract background with soft blue geometric elements and radiating lines creating a modern minimal design