AI vs Attorneys Insights from the Vals’ Legal AI Report

AI vs. Attorneys

Insights from the Vals’ Legal AI Report

BY PAMELA LANGHAM

LAW FIRMS ARE INCREASINGLY ADOPTING generative artificial intelligence (AI) for administrative tasks and law practice. But how effective are these tools when applied to real-world legal tasks? The Vals Legal AI Report set out to answer this pressing question by rigorously evaluating four distinct legal AI platforms across seven critical tasks. By benchmarking their results against those generated by a lawyer control group, the report offers an in-depth analysis of how these tools compare to human expertise, providing valuable insights for legal professionals seeking to harness the potential of AI in their work.

Vals AI Platform

The Vals AI Platform is a tool designed to evaluate the performance of large language models (LLMs) on industry-specific tasks. It creates custom benchmarks that mimic real-world use cases, allowing for unbiased and accurate assessments of how these models perform in practical applications. The platform collects review criteria from subject-matter experts and runs evaluations at scale, making it a valuable resource for understanding the capabilities of LLMs in various domains.

The four AI tools they evaluated included Harvey’s AI Assistant, Thomas Reuters’ CoCounsel, vLex’s Vincent AI, and Vecflow’s Oliver. Lexis+AI was initially included but withdrew. The performance of the human lawyers is the Lawyer Baseline.

Val’s study assigned accuracy or performance scores to each AI tool based on specific criteria set for each legal task. The scores assigned are expressed as percentages. A higher percentage indicates that the AI tool performed better on that task “relative to other AI tools and the Lawyer Baseline.”

Law firms are increasingly adopting generative artificial intelligence (AI) for administrative tasks and law practice. But how effective are these tools when applied to real-world legal tasks?

Key Takeaways from VLAIR

Vals’ Legal AI Report

On February 27, 2025, Vals released the Legal AI Report (VLAIR), available at https://www.vals.ai/ vlair. Vals collected a human legal baseline from lawyers at a few large law firms. Vals did this to measure the performance of the AI legal tools compared to that of a human lawyer.

Vals tested seven tasks traditionally performed by lawyers: Data Extraction, Document Q&A, Document Summarization, Redlining, Transcript Analysis, Chronology Generation, and EDGAR Research. EDGAR research stands for the Electronic Data Gathering, Analysis, and Retrieval system that most companies use for document retrieval.

In four tasks, at least one AI tool outperformed the human lawyers (Lawyer Baseline): Data Extraction, Document Q&A, Document Summarization, and Transcript Analysis. The Lawyer Baseline outperformed all of the AI tools on Redlining and EDGAR Research. The Lawyer Baseline and Harvey Assistant tied on Chronology Generation.

Harvey AI outperformed the Lawyer Baseline and all the other AI tools on Data Extraction, Document Q&A, and Transcript Analysis. It tied with the Lawyer Baseline on Chronology Generation. Harvey AI was not tested on the EDGAR Research task.

CoCounsel outperformed the Lawyer Baseline on Document Summarization and scored the highest in that category. CoCounsel was not tested on Redlining, Transcript Analysis, or EDGAR Research.

According to the VLAIR, the best scores for each category are as follows:

Data Extraction: Harvey Assistant at 75.1

Document Q&A: Harvey Assistant at 94.8

Document Summarization: CoCounsel, 77.2

Redlining: Lawyer Baseline, 79.7

Transcript Analysis: Harvey Assistant, 77.8

Chronology Generation: Harvey Assistant, 80.2

EDGAR Research: Lawyer Baseline, 70.1

The table below was partially taken from VLAIR and demonstrates the test and performance results of the Lawyer Baseline, the AI tools, and the task average.

Evaluations like the VLAIR are important for the legal community to determine whether they are ready to adopt AI tools.

DENOTES PARTICIPATION IN THE TASK

Conclusion

Evaluations like the VLAIR are important for the legal community to determine whether they are ready to adopt AI tools. The study highlights the capabilities of legal AI tools in performing various legal tasks, demonstrating their potential to enhance efficiency and accuracy in law firms. But it also highlights the weaknesses in AI tools' accuracy and performance. Harvey Assistant emerged as a good performer, excelling in multiple tasks and surpassing the Lawyer Baseline in several tasks.

CoCounsel also showed strong results, particularly in Document Summarization and Document Q&A. While AI tools collectively outperformed human lawyers in several areas, they still lagged in complex tasks like EDGAR Research and Redlining. These findings underscore the value of AI tools while recognizing the need for continued technological advancements to fully meet the demands of legal practice.