- Posts by Hemant GuptaMember of the Firm
When health care systems, artificial intelligence (AI) health care software innovators, and major brands need to navigate complex technology transactions, data ownership challenges, and intellectual property (IP ...
As we noted in our previous blog post, HealthBench, an open-source benchmark developed by OpenAI, measures model performance across realistic health care conversations, providing a comprehensive assessment of both capabilities and safety guardrails that better align with the way physicians actually practice medicine. In this post, we discuss the legal and regulatory questions HealthBench addresses, the tool’s practical applications within the health care industry, and its significance in shaping the future of artificial intelligence (AI) in medicine.
The Evolution of Health Care AI Benchmarking
Artificial Intelligence (AI) foundation models have demonstrated impressive performance on medical knowledge tests in recent years, with developers proudly announcing their systems had “passed” or even “outperformed” physicians on standardized medical licensing exams. Headlines touted AI systems achieving scores of 90% or higher on the United States Medical Licensing Examination (USMLE) and similar assessments. However, these multiple-choice evaluations presented a fundamentally misleading picture of AI readiness for health care applications. As we previously noted in our analysis of AI/ML growth in medicine, a significant gap remains between theoretical capabilities demonstrated in controlled environments and practical deployment in clinical settings.
These early benchmarks—predominantly structured as multiple-choice exams or narrow clinical questions—failed to capture how physicians actually practice medicine. Real-world medical practice involves nuanced conversations, contextual decision-making, appropriate hedging in the face of uncertainty, and patient-specific considerations that extend far beyond selecting the correct answer from a predefined list. The gap between benchmark performance and clinical reality remains largely unexamined.
Those in the tech world and in medicine alike see potential in the use of AI chatbots to support mental health—especially when human support is unavailable, or therapy is unwanted. Others, however, see the risks—especially when chatbots designed for entertainment purposes can disguise themselves as therapists.
So far, some lawmakers agree with the latter. In April, U.S. Senators Peter Welch (D-Vt.) and Alex Padilla (D-Calif.) sent letters to the CEOs of three leading artificial intelligence (AI) chatbot companies asking them to outline, in writing, the steps they are taking to ensure that the human interactions with these AI tools “are not compromising the mental health and safety of minors and their loved ones.”
The concern was real: in October 2024, a Florida parent filed a wrongful death lawsuit in federal district court, alleging that her son committed suicide with a family member’s gun after interacting with an AI chatbot that enabled users to interact with “conversational AI agents, or ‘characters.’” The boy’s mental health allegedly declined to the point where his primary relationships “were with the AI bots which Defendants worked hard to convince him were real people.”
Blog Editors
Recent Updates
- DOJ Civil Division Announces 2025 Priorities: Promises “Aggressive” False Claims Act Enforcement of Civil Rights Violations and “Impermissible” Gender-Affirming Care
- HealthBench: Exploring Its Implications and Future in Health Care
- Navigating the Legal Risks of Consumer Protection Claims in Healthcare
- Oregon SB 951, Regulating the Corporate Practice of Medicine, Is Signed into Law—But Changes May Be in the Works Already
- CMS Doubles Down on Medicare Advantage Recoupment: Announces Aggressive RADV Strategy to Reclaim Billions