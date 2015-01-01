For case law research, the team at Harvey envisioned an experience where you could copy/paste a client question into a case law model, and it would answer that question thoroughly and cite all its sources. They tried the obvious techniques first: fine-tuning foundation models via public APIs and building retrieval-augmented generation (RAG) systems. But they ran into limitations with such a uniquely complex, open-ended use case.

“If you just do retrieval, you can answer very simple questions about areas of law that you aren’t really an expert in, but that’s actually not that useful for most attorneys,” Weinberg explained. “With case law research, you’re finding ammo for your argument, and that’s much more difficult to do.”

Foundation models were strong at reasoning, but lacked the knowledge required for legal work. So, Harvey decided to partner with OpenAI to build a custom-trained model that would allow them to inject new knowledge, and ways of reasoning about that knowledge, into base models.

“None of these problems have a clear-cut solution,” Pereyra said. “A lot of it was sitting down together, having our lawyers explain how case law research works, having our researchers show what we’ve done, and learning from OpenAI about the levers we had to approach the problem.”

Harvey and OpenAI worked together to add the depth of context needed, first starting with case law from Delaware, and then expanding to include all of U.S. case law. They added the equivalent of 10 billion tokens worth of data to power the custom-trained case law model.

