Try our new intelligent model routing solution, Arcee Conductor. Sign up today and get a $20 credit.

Return to blog

Case Studies

06
Jun
2025
-
5
min read

Building Madeline-s1, a World Class Reasoning Model

How Arcee AI helped Madeline build a reasoning model from first principles.

Andrew Walko
,
Raghav Ravishankar
,
Prince Rumi
,

Madeline & Co.’s Challenge

Madeline & Co. is an end-to-end AI-powered strategy, design, and innovation platform that helps anyone, from in-house teams and founders to marketers and creatives, navigate complex decisions with clarity and confidence. At the core is Madeline-s1, a powerful language model trained in design, strategy, systems thinking, UX, and storytelling, delivering real-time insights and intelligent recommendations as you build.

When initially building out their product suite, Madeline & Co. tried off-the-shelf large language models (LLMs); however, they constantly ran into issues of high inference costs, poor performance at scale, and inconsistent accuracy for their specific domains. The accuracy issues they faced primarily revolved around a lack of context-specific reasoning, cross-disciplinary synthesis, and brand-safe output. Madeline & Co. founder, Prince Rumi, described one specific example:

When exploring a brand strategy for a sustainability startup, general models like Claude Sonnet 3.7 and GPT-4 would describe common channels or run-of-the-mill SWOTs. However, we required a model that would draw from a cross-section of startup decks, ethnographic research, brand campaigns, and founder memos to suggest an unexpected but contextually valid launch path—say, a limited-release collaboration with a fashion designer in the climate space. That leap requires nuance, not just knowledge.

This dissatisfaction with LLMs led Madeline & Co. to partner with Arcee AI in building a custom reasoning model that could reason with depth, align with internal frameworks, and operate with the flexibility needed for creative and strategic exploration. By partnering with Arcee AI, Madeline & Co. gained access to Arcee’s research lab, which helped them through each stage of the model development cycle.

How Arcee AI Trained Madeline & Co.’s Reasoning Model

Data Curation, Continuous Pre-Training, and Model Merging

The initial phase involved building a substantial 60 million token dataset, curated from Madeline & Co.’s deep expertise in their field. 

With this dataset, we applied Continuous Pre-Training (CPT), targeting specific domain knowledge gaps and behavioral patterns identified in the base model.

Following CPT completion, we utilized Arcee's MergeKit library to combine our newly trained model with complementary models in the ecosystem. We determined the merging ratios through systematic experimentation and testing various interpolation weights to achieve optimal performance across our evaluation benchmarks. 

Behavioral Analysis, Synthetic Data Generation, and Supervised Fine-Tuning

Following the merge, we conducted a comprehensive behavioral analysis of the resulting model. We deployed the model in controlled environments and collected extensive feedback on its responses across diverse query types. We analyzed patterns in reasoning quality, factual accuracy, instruction adherence, and handling of edge cases.

Based on the behavioral analysis, we then curated a high-quality dataset of question-answer pairs specifically designed to address the identified weaknesses. With 100 golden question-answer pairs provided by Madeline & Co., we used a proprietary synthetic data generation technique to create 350k pairs for the SFT run. We meticulously crafted this Supervised Fine-Tuning (SFT) dataset to include examples that demonstrated desired reasoning patterns, correct factual information, and appropriate response styles.

This iterative approach – CPT for knowledge acquisition, merging for capability integration, analysis for weakness identification, and targeted SFT for behavioral refinement - represents a sophisticated model development pipeline that maximizes the strengths of each training paradigm while addressing their limitations.

Results and Evaluation

After 2 months of training, we presented Madeline-s1, a 32-billion-parameter reasoning model explicitly trained to reflect how Madeline expert strategists and designers think, mapping tradeoffs, surfacing multiple valid options, and grounding decisions in strategic insights  and customer   research. Rather than producing templated or generic answers, we engineered Madeline-s1 to generate meaningful interpretations and actionable insights across disciplines.

To evaluate Madeline-s1, we conducted a blind human preference test and also evaluated the model on industry-standard benchmarks. The results are as follows.

                                          Human Preference Evaluation (Blind Test)

Matchup Win Rate
Madeline vs Claude Sonnet 4 (Business, Strategy, Design, Product) 91.5%
Madeline vs OpenAI o3 (same domains) 94.2%
Madeline vs Claude Sonnet 4 (Film & Storytelling) 58.5%
Madeline vs OpenAI o3 (Film & Storytelling) 80.0%
Madeline vs Gemini 2.5 Pro (Film & Storytelling) 88.9%

                                                        Standardized Evaluations

Benchmark Madeline
Math500 90.8
AIME24 66.7
LCB Easy 89
LCB Medium 56.3
LCB Hard 17.9
MMLU 82.1
GPQA Diamond 62.6

The model's performance exceeded expectations in internal evaluations, particularly in domains where general models struggled. Madeline & Co. deployed Madeline-s1 into production and integrated the model across their core product offerings.

Conclusion

This collaboration demonstrates what’s possible when AI is purpose-built utilizing a company’s proprietary data and insights along with Arcee AI’s post-training techniques.

To get in touch with Arcee AI to discuss potential collaboration, please reach out here.

To get early access to Madeline-s1 and Madeline & Co. platform please sign up here.  Listen to Madeline introduce herself here!

Give Arcee a Try

Lorem ipsum dolor sit amet consectetur. Vitae enim libero lectus urna blandit sapien. In egestas ac dolor dictum.
Book a Demo

Sign up for the Arcee AI newsletter

Subscribe to get the latest news and insights on SLM-powered AI agents

Thank you!

We will get back
to you soon.
Oops! Something went wrong while submitting the form.