Small And Mighty: Arcee AI Language Models Excel Across Yupp.ai Leaderboards

At Arcee AI, we’ve pioneered a family of Small Language Models (SLMs) – compact yet powerful models that run efficiently on a single GPU and are easily tailored for specific tasks.

Arcee SLMs cover a broad range of needs – from general-purpose assistants (such as the Arcee Virtuoso model family) to specialized reasoning (Arcee Maestro), coding (Arcee Coder), and other functions. Despite their smaller size, we engineer our models using advanced techniques (model merging, distillation, guided RL, etc.) so that they compete with, and often outperform, Large Language Models (LLMs).

Academic benchmarks are useful for evaluating and comparing model performance. However, there’s nothing more valuable than real-world user testing. To support this approach, Yupp.ai has created a new evaluation platform that ranks AIs by aggregating real-user usage preferences into a “VIBE Score”.

We’re thrilled to report that several Arcee SLMs have already risen to the top on Yupp’s public leaderboard in several categories. Maestro and Coder Large appear among the highest-performing models on reasoning and coding prompts, and our new AFM-4.5B-Preview model even sits near the very top for short-form Q&A tasks. This user-driven evidence confirms that Arcee’s models excel on real-world prompts and not just artificial tests.

Arcee Maestro (32B): Our fine-tuned reasoning model (built on Qwen-2.5) excels particularly in math and logic tasks. This class-leading performance shows up in user tests. On Yupp’s high-reasoning benchmark, Maestro currently ranks #5, tied with top models like Anthropic’s Claude Sonnet 4, and #8 overall when considering all prompt types. Such results highlight that Maestro’s “detailed reasoning” approach enables it to solve problems as well as, or better than, far larger models.

Arcee Maestro (32B): #5 on high-reasoning prompts

Arcee Maestro (32B): #8 on all prompt categories

‍Arcee Coder (32B): Our code-specialized model (also built on Qwen-2.5) delivers superior coding accuracy and efficiency. Importantly, Coder Large is cost-effective, as it is priced well below proprietary alternatives, allowing engineering teams to scale up coding tasks without incurring runaway cloud costs. On the Yupp leaderboard, Coder Large ranks #6 (tied with Claude Sonnet 3.7) in the long-response multi-turn category, reflecting its strength on real developer prompts.

Arcee Coder (32B): #6 on long multi-turn conversations

AFM-4.5B-Preview (4.5B): Our brand-new open-weight foundation model is designed from the ground up for enterprise needs. Despite its small size of 4.5B, AFM-4.5B delivers business performance comparable to much larger models at vastly lower hosting costs. It includes built-in function-calling and agentic reasoning, as well as multilingual support, and we trained it on ~7 trillion tokens of carefully filtered data for accuracy and compliance. On short-turn QA and instruction tasks, AFM-4.5B-Preview takes an incredible #2 spot on the Yupp leaderboard. These rankings align with our lab results – AFM-4.5B-Preview consistently achieves accuracy on par with significantly larger models, while allowing deployment on low-end GPUs (and even CPUs).

AFM-4.5-Preview: #2 on short-turn conversations

These results showcase the Arcee value proposition: small, task-optimized models that punch above their weight. By fully owning the training pipeline, we embed efficiency and compliance by design. As the developers of open tools like MergeKit, Spectrum, and DistillKit, we apply the latest research to make SLMs that are both accurate and lean. AFM-4.5B, for example, was built as a “no trade-offs” model with cost-efficiency, customizability, and enterprise-grade compliance baked in from the start.

The practical payoff is huge: firms can serve users and orchestrate agents using these SLMs, saving on compute bills without sacrificing quality. In real-world use, this means faster responses, lower TCO, and the ability to run advanced AI workflows on modest hardware. Customers can deploy Arcee SLMs anywhere – on-premises or in the cloud – knowing their privacy and compliance are fully preserved.

Long story short: Arcee SLMs translate into clear business value, which is exactly what our customers need for deploying AI at scale.

You can easily try our models on Together.ai

If you’d like to learn more, please get in touch, and we’ll be happy to discuss how we can help you implement secure, efficient, and cost-effective AI solutions.

‍

Arcee AI Models Excel Across Yupp.ai Leaderboards

Give Arcee a Try

Small And Mighty: Arcee AI Language Models Excel Across Yupp.ai Leaderboards

Related Posts

Sign up for the Arcee AI newsletter

Products

Community

Company

Resources

Arcee AI Models Excel Across Yupp.ai Leaderboards

Give Arcee a Try

Small And Mighty: Arcee AI Language Models Excel Across Yupp.ai Leaderboards

Related Posts

Building Madeline-s1, a World Class Reasoning Model

The Case for Small Language Model Inference on Arm CPUs

Why AI is So Expensive

Sign up for the Arcee AI newsletter

Products

Community

Company

Resources