Try our new intelligent model routing solution, Arcee Conductor. Sign up today and get a $20 credit.

Return to blog

Partnerships

27
May
2025
-
7
min read

Arcee AI SLMs Now Available on Together.ai and OpenRouter

Arcee AI is excited to announce the availability of its small language models (SLMs) on Together.ai and OpenRouter, two leading managed inference platforms. Start building today and leverage Arcee AI’s specialized models to enhance your AI applications.

Julien Simon
,

Today, we are thrilled to announce that our suite of small language models (SLMs) is now available on Together.ai and OpenRouter, two leading platforms for managed, pay-per-token inference services. With this seamless integration, you can start building high-quality, fast, and cost-efficient AI-powered applications in minutes, without having to manage complex AI infrastructure. 

Whether you’re looking for models for general-purpose chat applications, code generation, function calling, reasoning, or image-to-text, we’ve got you covered! Here are the available models.

Centered Model Table
Model Name Description Size Context Window Price per Million Input Tokens Price per Million Output Tokens Together.ai OpenRouter
Arcee Blitz Fast, cost-efficient model for everyday chat. 24B 33K $0.45 $0.75 Link Link
Virtuoso Medium V2 Balanced model for a wide range of tasks. 32B 131K $0.50 $0.80 Link Link
Virtuoso Large General-purpose LLM for cross-domain reasoning and creative writing. 72B 131K $0.75 $1.20 Link Link
Coder Large Optimized for code generation and refactoring. 32B 33K $0.50 $0.80 Link Link
Caller Large Specialized for function calling and orchestration. 33B 33K $0.55 $0.85 Link Link
Maestro Reasoning Flagship model for step-by-step logical reasoning. 32B 131K $0.90 $3.30 Link Link
Spotlight Lightning-fast vision-language model for image-text tasks. 7B 131K $0.18 $0.18 Link Link

Why Build with Small Language Models

Use The Right Model for Each Task

In the world of AI, one size does not fit all. While frontier-scale "Swiss Army knife" large language models look impressive, their high cost and slow generation time are problematic for many specific business tasks. Arcee AI's specialized SLMs represent a shift toward purpose-built AI that excels at targeted use cases.

Consider Caller Large, which we specifically trained for function-calling workflows. Unlike general models that might hallucinate API calls or struggle with parameter extraction, we specifically built and trained Caller to make deliberate decisions about when to invoke external tools. For enterprises building process automation tools, this specialization translates to fewer errors and more reliable outputs than you'd get from a jack-of-all-trades model.

Similarly, Spotlight demonstrates the power of focus—a 7-billion parameter vision-language model that punches well above its weight class on visual question-answering benchmarks. By optimizing specifically for fast inference on consumer GPUs while maintaining strong image grounding, it delivers production-ready performance for UI analysis, chart interpretation, and visual content moderation at a fraction of the computational footprint required by larger multimodal models.

Increase Efficiency at a Reasonable Cost

As applications transition from proof-of-concept to production, the economics of AI deployment become increasingly important. Arcee AI's SLMs offer a compelling value proposition, providing near-frontier quality at significantly reduced cost, making them a cost-effective choice for enterprises looking to maximize their AI investment.

Arcee Blitz exemplifies this approach—an extremely versatile 24-billion parameter package that runs at approximately one-third the latency and price of comparable 70B models. For high-volume use cases like customer service automation or content generation, this translates to substantial savings that compound over time. A company processing thousands of customer inquiries daily might save tens of thousands of dollars monthly while maintaining response quality that end-users can't distinguish from more expensive alternatives.

The pricing advantage extends across the entire Arcee AI lineup. Coder Large delivers competitive performance on programming benchmarks such as the DevQualityEval Leaderboard at price points well below proprietary coding assistants, enabling development teams to deploy AI coding support to more engineers with much less concern for usage costs.

Scale From Laptop to Data Center

Technical scalability matters as much as economic scalability. Virtuoso Medium V2 exemplifies this approach with its aggressive quantization options—from full BF16 precision down to 4-bit GGUF variants that can run on consumer hardware. Organizations can start experimenting with a proof-of-concept running locally at near-zero cost, then seamlessly scale to cloud deployment as usage grows.

The scalability advantage extends to context handling as well. Models like Virtuoso Large maintain impressive 128K context windows despite aggressive optimization, allowing them to process entire codebases, legal documents, or scientific papers in a single pass. You can eliminate complex chunking strategies or context management that might otherwise become bottlenecks as applications scale.

Get More Business Performance From Your SLMs

Arcee AI's models deliver exceptional performance on metrics that matter for business applications. Maestro Reasoning isn't just a 32B parameter model—it doubles the pass rates on challenging MATH and GSM-8K benchmarks compared to its predecessors while maintaining full transparency in its reasoning process. For financial analysis, scientific computing, or any scenario requiring audit trails, this combination of accuracy and explainability is invaluable.

Similarly, Coder Large doesn't just understand multiple programming languages—it demonstrates 5-8 percentage point gains over CodeLlama-34B-Python on key benchmarks, particularly in producing compilable, correct code. For development teams, this translates to fewer debugging cycles and more reliable suggestions.

This performance advantage stems from Arcee AI's focused training methodology. Rather than optimizing for general benchmark leaderboards, each model undergoes specialized fine-tuning and reinforcement learning aligned with its intended use case, resulting in domain-specific capabilities that outshine models of similar or even larger size. This ensures that you're getting the highest quality and performance from our models.

Why Work with Managed Inference Platforms

Building AI applications requires more than just powerful models—it demands reliable infrastructure, optimization expertise, and operational support. Instead of designing great features and user experiences, development teams have to spend time managing clusters, wrestling with the CUDA dependencies, and performing plenty of other undifferentiated technical tasks. 

Managed inference platforms like Together.ai and OpenRouter dramatically simplify the development process. They handle the complex infrastructure requirements of model deployment, from GPU provisioning and scaling to load balancing and redundancy. In fact, Arcee is using Together.ai to host all the models available in Arcee Conductor, our model routing inference platform!  

Thanks to user-friendly playgrounds, you can start testing models immediately without writing any code.

Testing Arcee Blitz in the OpenRouter playground

Of course, production-ready OpenAI-compatible APIs are available, and they enable quick integration with existing systems. Whether you're building a new application from scratch or enhancing existing systems with AI capabilities, these standardized interfaces significantly reduce time-to-market compared to self-hosted alternatives.

Invoking Arcee Blitz on Together.ai

Get started now!

Get started today by signing up on Together.ai or OpenRouter.

Explore the models, integrate them into your applications, and unlock new levels of business value with Arcee AI. The future of AI is here, and it's more accessible and efficient than ever.

If you have any questions, feel free to contact us at support@arcee.ai 😃

Give Arcee a Try

Lorem ipsum dolor sit amet consectetur. Vitae enim libero lectus urna blandit sapien. In egestas ac dolor dictum.
Book a Demo

Sign up for the Arcee AI newsletter

Subscribe to get the latest news and insights on SLM-powered AI agents

Thank you!

We will get back
to you soon.
Oops! Something went wrong while submitting the form.