Small and mighty!
Running language models on CPUs has been discussed for some time, but delivering accurate results with production-level performance remains unproven. So, is using CPUs for language models truly viable in production?
SuperNova 70B, Virtuoso-Large 72B, Caller 32B, GLM-4-32B-Base-32K, and Homunculus 12B
Merging for pre-training, data privacy in healthcare, and language support
From 4k to 64k context through aggressive experimentation, model merging, distillation, and a concerning amount of soup.
Built for performance, compliance, and affordability.
The first release—AFM-4.5B—is a 4.5-billion-parameter model that delivers excellent accuracy, strict compliance, and very high cost-efficiency.
A training-free method to transplant tokenizers in pre-trained language models
Thanks to a chatbot interface powered by open-source small language models and real-time data analytics, store associates can interact naturally through voice or text.