CASE STUDY
/
GenAI for Law: Domain-Adaptation of a Language Model Specialized in Patents

GenAI for Law: Domain-Adaptation of a Language Model Specialized in Patents

Customer
:
David Buniatyan
Company
:
Activeloop
Address
:
Mountain View
Department
:
Data Infrastructure
Employees
:
20
Employees
Arcee Small Language Models
Arcee Enterprise

With customer Activeloop, we've co-created PatentPT – the most advanced LLM and retrieval system for patent search and generation, trained on U.S. Patent data

50%

Fewer Hallucinations

2.5x

Faster Response Tiimes

< 1 month

Model Delivered in 3 Weeks

The Problem

Activeloop helps enterprises to organize complex unstructured data and retrieve knowledge with AI, with many of its customers working in heavily-regulated industries.

For a subset of its customers, Activeloop needed to provide highly accurate AI search across all U.S patents, and build a patent generation engine powered by a custom language model.

The U.S. Patent and Trademark Office (USPTO) website is a portal to an incredible amount of knowledge: the USPTO dataset consists of over 8 million patents, and its corpus of text contains some 40 billion words.

But – as anyone who has visited the USPTO website can attest – it’s a site that’s notoriously difficult to navigate, with a slow and rigidly-structured search engine (we suspect it’s running on Cobalt servers, without any neural network execution).

When Activeloop approached us to co-develop a GenAI approach to U.S. Patent data, we were thrilled with the opportunity. Together, we saw it as a challenge to make the incredibly rich dataset of U.S. patents more easily accessible to a broader audience. 

The goal was to build a retrieval engine with powerful search and generation capabilities – including:

  • Autocomplete
  • Patent search on Abstracts
  • Patent search on Claims
  • Ability to generate Abstracts
  • Ability to generate Claims 
  • General chat.
David Buniatyan
Founder, Activeloop

Arcee AI plays the way in domain-specific SLM development. We’ve collaborated on ‘PatentPT’, a patent search and generation engine compromising a memory agent powered by Activeloop Deep Lake for accurate search and retrieval across millions of patents — and a combination of bespoke and fine-tuned SLMs by Arcee AI. If you’re looking for a great partner that has the best expertise in unlocking the value of language models for your private data at a reasonable cost, Arcee is the perfect choice.

The Results

  • FAST TIMELINE FROM DATA TO DEPLOYMENT
    We successfully built, trained, and deployed PatentGPT in less than three weeks, leveraging Arcee Enterprise to train a custom language model, and Activeloop Deep Lake's ability to structure and accurately retrieve unstructured text data, as well as the Deep Lake dataloader for model training.
  • PERFORMANCE THAT BEATS OPENAI
    With Deep Lake query engine, and Arcee's suite of optimization tools, we achieved 50% fewer hallucinations and 2.5x faster response times vs. OpenAI Ada+Pinecone setup.

Product Used

Activeloop deployed Arcee Enterprise built on AWS to ensure they had the most secure, resilient, and cost effective environment for their domain-specific Generative AI models.  

With Arcee Enterprise built on AWS their data never leaves their VPC.

PatentPT was also powered by Activeloop Deep Lake for data storage, retrieval, and model training.

Start building with the next gen of open weight models.

Collage of AI and technology images including humanoid robot head, laptop displaying AI interface, and two people collaborating at computer with code on screen