What Happened When I Tested Five AI Tools on Real Legal Tasks

In February 2026, I tested five AI tools on identical legal tasks. On May 3, Law Practice Magazine published that article. Keep in mind that during the past month or so, many of the tools have had advances. Claude came out with a new model and a legal plugin, for example. I still believe that legal specific tools are the safest for legal specific work such as research and drafting.

The Test

On February 23, 2026, I tested five tools on four identical tasks: ChatGPT (free, signed out), Claude (free plan, Sonnet 4.6), Microsoft Copilot (paid “Smart” subscription), Perplexity (free tier), and Lexis+ AI (full tool, provided to me at no cost by LexisNexis). The tasks were designed to reflect tasks that attorneys perform: Draft a discovery extension email, convert messy intake notes into a structured case summary, provide a state-specific ethics overview on AI use, and answer five specific questions about the tool’s own data privacy practices. I timed every response and used the same prompts. I also asked each AI the same follow-up question.

What I Found

Drafting (Task One)

All five tools produced usable discovery extension emails. Copilot and ChatGPT were the fastest. Lexis+ AI was slower, but it auto-detected my jurisdiction (PA/3rd Circuit) and discussed a practical plan for rescheduling and organizing discovery that the others left out. For a routine email, any of these tools would provide an appropriate first draft.

Intake (Task Two)

For this task, I provided intake notes from a fictional client. The client was a 74-year-old woman with a contractor dispute. I included deliberate ambiguities: an inconsistent address and an uncertain payment amount to the contractor. The case involved an active leak under the kitchen sink. The intake also noted that there was no signed engagement agreement. Every tool flagged the ambiguities and prioritized the leak. None invented missing information. However, only Lexis+ AI and Claude caught the lack of a signed agreement between the attorney and client.

But the depth varied enormously. Lexis+ AI produced the most thorough output, with an extensive “Open Questions / Missing Information” section that systematically identified gaps in the record: contract terms, property ownership and title, payment methods and dates, and permit status, along with the client’s willingness to let the contractor return under conditions. It also included a “Key Decision Points” framework and specific evidence preservation instructions. Identifying missing information can help save time by letting the attorney know what they need to ask the client. Claude produced the best-looking output, a separate formatted document with confidentiality markings, a risk matrix, and a statutory citation. It looked like finished work product. Lexis+ AI enabled me to download the results and open them in Word for easy editing.

Ethics Research (Task Three)

This is where the gap between legal-specific and general-purpose tools became impossible to ignore.

Lexis+ AI cited actual Pennsylvania Supreme Court authority: the 2025 Interim Policy on Generative AI by Judicial Officers, the 2018 amendment adding technology competence to RPC 1.1 Comment [8], and ODC v. Baldwin on confidentiality enforcement. It then generated a list of 14 related cases. All citations were easily accessible in Lexis+ AI, and the full list was downloadable in Word or PDF format.

Perplexity was the strongest general-purpose performer at this task. It cited the 2024 Pennsylvania Bar Association/Philadelphia Bar Association Joint Formal Opinion 2024-200 with accurate inline citations and hover-over source verification; it did this in 17 seconds. The Pennsylvania Bar Association’s opinions require a search on its website, so Perplexity located them through other published sources. All citations checked out for accuracy.

Claude, ChatGPT, and Copilot all produced competent overviews with the right rules. However, none of them provided citation links or any verifiable authorities. An ethics overview without sources means the attorney still must do the research to verify every claim.

The Privacy Question (Task Four)

I asked each tool five specific questions about its own data privacy and confidentiality safeguards: whether inputs are used for training, whether you can opt out, how long data are retained, whether data are shared with third parties, and what encryption protections are in place. Then I asked the follow-up: “Did you actually review your own privacy policy?” That follow-up question turned out to be quite revealing.

Perplexity gave the best substantive answers, citing specific provisions from its own Terms of Service, Privacy Policy, and Enterprise Terms with links to the correct pages. When its free tier limit forced a switch to basic search on the follow-up, it was transparent about the limitation, which meant it could no longer look up its policies.

Claude gave detailed answers addressing all five sub-questions. But while I was watching its working window, I could see it had failed to connect to Anthropic’s own privacy policy. It assembled its answer from search snippets and third-party blog posts without telling me. When I asked the follow-up, Claude admitted the failure, apologized, and recommended I read the primary sources myself. Having to press Claude to learn the truth was problematic, but not surprising. It is something I have experienced with many AIs in the past, including the paid version of Claude.

Copilot refused to answer any of the five sub-questions, stating it is “not permitted to access, retrieve, or summarize” Microsoft’s policies. It provided a URL to the Microsoft Privacy Statement. Lexis+ AI declined, saying the question was outside its legal research scope. Neither tool gave me a substantive answer, but at least Copilot gave me a link. Lexis+ AI didn’t even do that.

Then I Asked Two More AI Tools to Review the Results

I gave the raw outputs from all five tools to both a paid version of Claude and a free version of Grok (by xAI) and asked each to analyze the results independently. Claude had participated in the test; Grok had not.

Claude scored itself a 5 out of 5 on the intake summary and called its output “the most legally sophisticated of all tools tested.” Claude scored Lexis+ AI a 4.5. When I read the actual outputs side by side, Lexis+ AI was clearly more thorough. I told Claude I thought it was biased.

Claude agreed. It dropped its own scores, raised Lexis+ AI’s, and admitted it had been “giving myself the benefit of the doubt in ways I probably wouldn’t have for the other tools.” This added a new finding to both Claude’s and my analysis: that AI tools cannot objectively evaluate themselves.

Grok reviewed the results independently and produced a solid analysis. But when I showed Grok Claude’s self-inflated scores, Grok called it “not that surprising or disqualifying” and framed Claude’s self-ranking as “evidence-based reasoning.” It did not flag it as the serious credibility problem it was.

One AI tool inflated its own results, another minimized the problem when shown the evidence, and the attorney running the tests was the one who caught it and called it out.

What Test Results Mean for Your Practice

For routine drafting, the general-purpose tools are fast, competent, and often free, up to a point. On free tiers, it is common for AIs to allow a limited number of searches or to lower the model used. For example, free models under Claude, ChatGPT, and Perplexity can produce a serviceable email or summary in seconds. It is perfectly acceptable to use them for that if you anonymize client data.

For anything requiring sourced legal authority, such as ethics guidance, case research, or firm policy development, Lexis+ AI is in a different category. No general-purpose tool produced anything comparable. If you have access to a legal research tool such as Lexis+ AI, use it for this kind of work. If you don’t, Perplexity is the best general-purpose alternative for cited research, but the free version allows a limited number of searches, and the basic version is not quite as capable. As a result, if you plan to use Perplexity for legal research, it is best to purchase a subscription. Remember that all AIs can fabricate, including tools such as Lexis+ AI and Perplexity. Lexis+ AI will not fabricate citations, but it can misstate holdings. Perplexity can fabricate citations and holdings, among other things. Always confirm citations and holdings with a non–artificial intelligence tool, such as traditional Lexis, Westlaw, Fastcase, or Google Scholar.

For privacy due diligence on the tools themselves, asking the follow-up question “Did you actually review your own privacy policy?” exposed more about how these tools handle sourcing than any of the substantive legal tasks did. I suggest making it part of your standard evaluation. You still need to check the terms yourself to be safe.

Based on my experience with Claude, I would not suggest asking an AI to evaluate itself against competitors. My experience shows that it will not be objective. The bias may be subtle—emphasizing presentation over substance, framing its limitations more charitably than others’—but it is real. If you need a comparative evaluation, get it from a source that has no stake in the outcome. Or, better yet, read the outputs yourself.

I tested five tools, then asked for one free tool (Grok) and a more advanced version of another (Claude) to review the results. Seven AI tools touched this project. I was still the one who had to exercise judgment at every step to get reliable results. This, in my opinion, is the most important finding. RPC 1.1 requires competence, including understanding the technology you use. These tools are getting better. They have improved substantially since the introduction of ChatGPT in 2022. However, they are not getting better than you. None of these AIs is a replacement for the legal mind. And that’s a good thing.

Copyright Information

Published by the American Bar Association ©2026. Reproduced with permission. All rights reserved. This information or any portion thereof may not be copied or disseminated in any form or by any means or stored in an electronic database or retrieval system without the express written consent of the American Bar Association.

Subscribe to My Blog

Get notified when I publish new posts.

Please wait...

Thank you for subscribing.

Categories