Arcee AI | Deep Dive: AFM-4.5B, the First Arcee Foundation Model

Blog

Deep Dive: AFM-4.5B, the First Arcee Foundation Model

Mark McQuade

Lucas Atkins

Fernando Fernandes Neto

Charles Goddard

Varun Singh

•

June 18, 2025

Built for performance, compliance, and affordability.

Today marks a pivotal moment for Arcee AI and our customers: the launch of AFM-4.5B, the first Arcee Foundation Model. AFM-4.5B is the result of a deliberate, ambitious effort to deliver enterprise-grade AI that meets the real needs of today's organizations: performance, compliance, and affordability—at a scale and quality not previously available.

For a quick taste, you can test AFM-4.5B in our playground and on Together.ai.

Why We Built AFM-4.5B

Our journey began in response to a pattern we saw across countless customer deployments. Over the years, we've helped organizations push AI further, driving better performance and lower costs through precision tuning and targeted post-training. But as AI adoption grew, we saw a set of inescapable pain points emerge.

Performance and Size Gaps

Edge-optimized models weren’t simply reliable enough for demanding tasks. Customers needed a model that could run on modest hardware, yet still deliver top-tier accuracy and robustness.

Regulatory and Licensing Friction

The most advanced models from major Chinese AI labs (Deepseek, Qwen, GLM, MiniCPM) offered impressive results, but rarely satisfied Western compliance standards, disqualifying them for regulated industries.

Stagnant Western Alternatives

Models from Meta (Llama) and Mistral, while solid, were quickly becoming outdated in relevance. The 3–10B parameter space was primarily served by models a year old or older, outpaced by newer research, data pipelines, and post-training strategies.

Customers faced a hard choice: compromise on performance, compliance, or future flexibility. We knew there had to be a better way. The answer wasn’t a patchwork of tweaks or incremental improvements. We committed to a bold course: design and train a new model, from the ground up, for the world our customers actually operate in.

The Making of AFM-4.5B: How We Trained It

Training a foundation model of this scale is never simple. We took on the challenge not just to build a better model, but to prove that focus, rigorous data practices, and deep expertise could deliver a step-change in real-world utility.

Uncompromising Data Quality

We knew that in order to build the strongest models possible, we needed the best possible training data. To achieve this, we partnered with DatologyAI, the leading experts in data curation, to assemble 6.58 trillion tokens of the most relevant, highest-quality data possible.

Data curation for foundation models is hard. It's a frontier research problem—it's a comparatively new field, experiments are costly to run at scale, and small-scale results often aren't predictive of large-scale outcomes. It's also a frontier engineering problem—there's no established playbook for implementing a curation pipeline that can scale up to the trillions of tokens that needed to train competitive foundation models. We knew it just wouldn't make sense to try to tackle this ourselves. This is why we chose to partner with DatologyAI.

DatologyAI's curation pipeline integrates a suite of proprietary algorithms—model-based quality filtering, embedding-based curation, target distribution-matching, source mixing, and synthetic data—and customizes them to generate a strong general-purpose dataset that also targets the capabilities we wanted our model to have. The results showed early: by 2 trillion tokens, AFM-4.5B was already outperforming competing models trained on dramatically larger, but noisier datasets.

Purpose-Built Infrastructure

We utilized Amazon SageMaker Hyperpod and orchestrated training across 512 Nvidia H200 GPUs. This cloud infrastructure enabled us to experiment rapidly with various architectural variants, hyperparameter sweeps, and targeted interventions.

Expert Post-Training for Real-World Reliability

AFM-4.5B’s clean foundation made it a prime candidate for our multi-stage post-training pipeline—built to adapt the model to enterprise demands through advanced fine-tuning, distillation, merging, and alignment techniques.

Who AFM-4.5B Is For

AFM-4.5B is purpose-built for organizations that won't settle for compromises.‍

Cost-Effective Inference

Optimized for high throughput on CPUs, AFM-4.5B delivers GPU-tier results with efficient resource usage—ideal for both cloud and on-premise scenarios.

Flexible, Edge-to-Cloud Deployment

From enterprise servers to mobile devices and IoT modules, AFM-4.5B scales effortlessly, supporting AI where you need it. On an Amazon EC2 c8g.8xlarge instance (Graviton4, 32 vCPUs), an 8-bit version of AFM-4.5 running on llama.cpp can deliver well over 100 tokens per second at batch size 4. A 4-bit version delivers over 200 tokens per second. This combination of high-quality generation and high CPU performance opens up cloud and edge use cases that were impossible until now.

Post-Trained for Enterprise Tasks

We designed AFM-4.5B with real-world use cases in mind and have refined it to handle a broad spectrum of workloads with precision and reliability.

Tools and Expertise, Ready for You

We built AFM-4.5B to be more than just a foundation model; it's a launchpad for fast, reliable deployment in the real world.

AgentHarness and Retrieval Toolkits

Drop-in kits enable AFM-4.5B to power tool use, retrieval-augmented generation, and agentic reasoning—securely and on your infrastructure.

Rapid Domain Customization

‍Fine-tune the model to your vertical in hours, not months. Our pipelines and documentation help you take full control without unnecessary overhead.

Deep Dive: The Post-Training Pipeline

At the heart of AFM-4.5B’s real-world performance is its post-training stack—a layered strategy that surfaces and sharpens capabilities without sacrificing generality or introducing brittleness.

It begins with midtraining, where we infused the model with high-leverage datasets (math, code, complex reasoning) and carefully selected samples from DatologyAI’s corpus. This step gave the model strong early instincts for precision and clarity. From there, we performed checkpoint merging, consolidating, and enhancing intermediate models into a cohesive base. We extended context length using YaRN, a rotary scaling method that retains performance at scale, and refined this long-context foundation through advanced merging using MergeKit, our open-source tool. MergeKit allowed precise control over the model’s composition—layer-wise weighting, residual scaling, and targeted integrations—all of which contributed to consistency across varied tasks.

Next, we conducted supervised fine-tuning, focusing on instruction clarity, diversity, and alignment. Here, the model learned to adapt to a wide range of prompts—from legal analysis to creative writing—while avoiding the overfitting that weakens many instruction-tuned models.

Finally, we applied reinforcement learning using verifiable reward signals, helping the model prefer factual, high-utility responses. Post-RL merges smoothed out inconsistencies, and we followed with KTO, an alignment method where the model learns directly from trusted reference behavior.

This comprehensive stack ensures that AFM-4.5B not only performs well out of the box but remains reliable, steerable, and robust, ready for the highest-stakes enterprise environments.

Evaluations

Here are benchmarks for AFM-4.5B-Base (pre-distillation) and AFM-4.5B-Preview.

Benchmark	AFM-4.5B-Base	Qwen3-4B-Base	Gemma3-4B
Winogrande	77.03%	71.6%	72.45%
PIQA	82.15%	78.45%	79.2%
MMLU	65.8%	71.69%	59.96%
Arc-C	63.52%	64.33%	58.62%
HellaSwag	79.48%	75%	76.9%

Benchmark	AFM-4.5B-Preview	Qwen3-4B	Gemma3-4B-IT
MMLU	0.6533	0.6985	0.577
PIQA	0.815	0.746	0.773
Winogrande	0.704	0.664	0.697
IFEval	0.710	0.760	0.691
ARC-Easy	0.841	0.785	0.777
ARC-Challenge	0.619	0.542	0.572
HellaSwag	0.796	0.684	0.742

Where We Go From Here

AFM-4.5B is our most advanced model yet—but just the beginning. We're developing smaller variants for ultra-constrained edge devices, and scaling up for reasoning-intensive workloads. Whatever your requirements, we’re committed to delivering models that are powerful, transparent, and enterprise-ready.

As we move toward full release, here’s what you should know:

What You're Using Today: The AFM-4.5B-Preview model is an early post-training checkpoint, captured before our final rounds of reinforcement learning and KTO alignment. While it’s already strong across many tasks, we’re continuing to improve it through structured human evaluations and early user feedback, especially in areas such as code generation and mathematical reasoning, where we’ve already observed significant gains in later tuning stages.
Benchmarks in Progress: Final performance benchmarks are still under development and may change as we finalize the tuning. The preview version is not the final word on what AFM-4.5B can do—expect notable improvements in the release version.
Try It, Shape It: Although the open-weight model isn't yet ready for distribution, we're launching a hosted playground that allows developers, researchers, and enterprise users to test AFM-4.5B-Preview on their actual workloads. Whether you’re exploring creative generation, function calling, or domain-specific instruction following, your feedback will directly help shape the final instruct model. We look forward to seeing you run it on your local machines, private infrastructure, and embedded deployments.
Newly Open-Sourced Models: As we’ll now focus on our AFM foundation models, we're opening up access to our previously closed-source language models:
- Virtuoso-Large, now available under the Qwen 2.5 license.
- SuperNova, now available under the Llama-3 license.
- Caller, our production-grade function-calling model, now released under Apache 2.0‍

We believe in transparency, openness, and community-driven development, and these releases reflect our commitment to making high-quality AI more accessible.

Here’s how to get involved:

Explore the Model: Access AFM-4.5B-Preview in our playground and on Together.ai today.
Stay Tuned: The final model will be released in early July on Hugging Face under a CC BY-NC license.
Partner With Us: If you're seeking secure, sovereign AI infrastructure built for scale, we’re here to collaborate.

A New Era for Arcee and Enterprise AI

With AFM-4.5B, we're not just launching a model—we're launching a new era. Every layer of this system, from training data to post-training alignment, was built with one goal: to meet your needs with no compromises. Whether your priorities are cost, compliance, or performance, we’ve built AFM-4.5B to exceed expectations—on your terms.

Ready to see what AFM-4.5B can do for your enterprise? Let’s build this chapter together. Reach out for a demo or contact sales@arcee.ai—we’d love to collaborate with you.

Building AFM was a company-wide effort, and we’d like to thank the extended Arcee AI team for their contribution: Fernando Fernandes, Varun Singh, Charles Goddard, Lucas Atkins, Mark McQuade, Maziyar Panahi, Conner Stewart, Colin Kealty, Raghav Ravishankar, Lucas Krauss, Anneketh Vij, Pranav Veldurthi, Abhishek Thakur, Julien Simon, Scott Zembsch, Benjamin Langer, Aleksiej Cecocho and Maitri Patel.

Related Blogs

Company

•

July 30, 2025

Arcee AI Secures Strategic Investment to Accelerate Enterprise-Grade AI Platform Built on AFM Foundation Models

Prosperity7 Ventures, M12, Hitachi Ventures, JC2, Wipro, Samsung, and Guidepoint are now backing Arcee AI.

Company

•

June 18, 2025

Announcing Arcee Foundation Models

The first release—AFM-4.5B—is a 4.5-billion-parameter model that delivers excellent accuracy, strict compliance, and very high cost-efficiency.

Company

•

December 2, 2024

Arcee AI, From Small Language Model Pioneer, to Pioneering SLM-Powered Agentic AI Workflows

First, we pioneered small language models (SLMs). Now, we're elevating them to their full potential, leveraging them in our end-to-end, easy-to-use agentic AI workflow platform called Arcee Orchestra. Here's a look at how we got started with SLMs, and how we're now taking them to the next level.