Trinity Models

Reliable intelligence you can run anywhere.

Open weights and a production API for multi-turn conversations, tool use, and structured outputs. Pick your size based on where you deploy. Capabilities stay consistent.

Decorative geometric pattern with overlapping rectangular shapes in light gray on white background
Decorative geometric pattern with overlapping rectangular shapes in light gray on white background

Models

All Trinity variants share the same skill profile. Choose the footprint that fits your infrastructure.

Trinity Nano〔6B, 1B active〕

Edge, embedded, and privacy-critical

Runs fully local on consumer GPUs, edge servers, and mobile-class devices. Tuned for offline operation and latency-sensitive voice or UI loops.

Deploy on: on-device, edge servers, mobile, kiosks

Active parameters: 1B per token

Context window: 128K tokens

Trinity Mini〔26B, 3B active〕

Cloud and on-prem production

Serve customer-facing apps, agent backends, and high-throughput services in your cloud or VPC.

Deploy on: AWS, GCP, Azure, on-prem (vLLM, SGLang, llama.cpp)

Active parameters: 3B per token

Context window: 128K tokens

Capabilities

  • Agent reliability: Accurate function selection, valid parameters, schema-true JSON, graceful recovery when tools fail.
  • Coherent multi-turn conversation: holds goals and constraints over long sessions; follows up naturally without re-explaining context.
  • Structured outputs: JSON schema adherence; native function calling and tool orchestration.
  • Same skills across sizes: Move workloads between edge and cloud without rebuilding prompts or playbooks.
  • Super efficient attention: Reduced cost of running at long contexts compared to other models.
  • Strong context utilization: Makes full use of large input docs for more relevant, grounded responses.

Technical Overview

Sparse mixture of experts with highly efficient attention for lower latency and lower cost.

  • Trinity Mini: 26B total / 3B active per token
  • Trinity Nano: 6B total / 1B active per token
  • Curated, high‑quality data across diverse domains with strict filtering and classification pipelines.
  • Heavy synthetic augmentation to cover edge cases, tool calling, schema adherence, error recovery, preference following, and voice‑friendly styles.
  • Evaluation focuses on tool reliability, long‑turn coherence, and structured output accuracy.
  • 128K token context window
  • Structured outputs with JSON schema adherence
  • Native function calling and tool orchestration

Quickstart

1
Get an API
Get an API key or download weights
2
Pick model
Pick Mini for cloud or Nano for edge
3
Choose workload
Choose Reasoning or Standard per workload
4
Plug in your tools
5
Ship to production

FAQs

Trinity is a family of open-weight language models built for multi-turn conversations, tool use, and structured outputs. You can run it fully locally or use a hosted API, and the capabilities stay consistent across sizes.

Choosing the right Trinity model depends on your performance needs, workload complexity, and where you plan to run it. All models share the same capabilities, APIs, and skill profile, so you can move between them without changing prompts.

  • Trinity Nano (6B) is optimized for offline operation and low-latency loops on embedded or edge devices.
  • Trinity Mini (26B) is tuned for multi-turn agents, tool orchestration, and structured outputs in cloud or on-prem backends.
  • Trinity (Coming Soon) will support even larger contexts, complex reasoning and coding.

Trinity is specifically trained to support robust, multi-turn agent workflows. It can:

  • Select the right tool or function for each task
  • Produce valid parameters and schema-compliant JSON
  • Recover gracefully if a tool fails
  • Maintain long-term conversational coherence over 10–20 turns
  • Behave consistently across model sizes, allowing you to test locally and deploy in the cloud without changing prompts

These capabilities make Trinity well-suited for complex agent tasks that require accuracy, reliability, and structured outputs.

Trinity combines a high-performance architecture with carefully curated training to deliver reliable, multi-turn agent capabilities:

  • Architecture: Sparse Mixture of Experts (MoE), only a subset of experts activate per token, reducing latency and providing predictable compute costs.
  • Training:
    • High-quality, domain-diverse data
    • Synthetic augmentation for tool calling, schema adherence, voice style, and error recovery
    • Strict filtering and classification pipelines
  • Focus areas: Tool reliability, structured outputs, and long-term conversational coherence

Trinity Nano and Mini support a 128K-token context window, allowing the model to handle long conversations, multi-step workflows, and structured outputs without losing coherence.

The models are optimized for performance over extended contexts, using selective expert activation and training on long-form interactions. This ensures consistent reasoning, reliable tool use, and schema compliance even across dozens of turns, so complex tasks can be managed without manually trimming or chunking context.

Yes. Arcee provides an OpenAI-compatible API endpoint, making it easy to integrate into existing systems. For full integration details, see our docs.

Yes. Trinity natively supports structured outputs, including JSON and other schema-based formats. You can define your schemas, and the model will generate outputs that adhere to them, ensuring reliable data formatting for downstream applications. Learn more in our docs.

There are three ways to get started with Trinity:

  • Try it on the platform – start chatting with the models immediately to explore their capabilities.
  • Generate an API key – integrate Trinity into your own applications and workflows.
  • Download the open weights from Hugging Face – run the models locally or on your preferred infrastructure.

These options let you experiment, develop, and deploy Trinity quickly and easily.

Build with Trinity today

Free API and open weights are available now.

Contact Sales
Abstract background with soft blue geometric elements and radiating lines creating a modern minimal design