Health Law Advisor

Posts tagged HealthBench.

Blogs

June 17, 2025

6 minute read

HealthBench: Exploring Its Implications and Future in Health Care

Categories: Artificial Intelligence, Health Care

Hemant Gupta, Christine Burke Worthen, Alaap B. Shah

As we noted in our previous blog post, HealthBench, an open-source benchmark developed by OpenAI, measures model performance across realistic health care conversations, providing a comprehensive assessment of both capabilities and safety guardrails that better align with the way physicians actually practice medicine. In this post, we discuss the legal and regulatory questions HealthBench addresses, the tool’s practical applications within the health care industry, and its significance in shaping the future of artificial intelligence (AI) in medicine.

Blogs

June 11, 2025

3 minute read

HealthBench: Advancing the Standard for Evaluating AI in Health Care

Categories: Artificial Intelligence, Health Care

Hemant Gupta, Christine Burke Worthen, Alaap B. Shah

The Evolution of Health Care AI Benchmarking

Artificial Intelligence (AI) foundation models have demonstrated impressive performance on medical knowledge tests in recent years, with developers proudly announcing their systems had “passed” or even “outperformed” physicians on standardized medical licensing exams. Headlines touted AI systems achieving scores of 90% or higher on the United States Medical Licensing Examination (USMLE) and similar assessments. However, these multiple-choice evaluations presented a fundamentally misleading picture of AI readiness for health care applications. As we previously noted in our analysis of AI/ML growth in medicine, a significant gap remains between theoretical capabilities demonstrated in controlled environments and practical deployment in clinical settings.

These early benchmarks—predominantly structured as multiple-choice exams or narrow clinical questions—failed to capture how physicians actually practice medicine. Real-world medical practice involves nuanced conversations, contextual decision-making, appropriate hedging in the face of uncertainty, and patient-specific considerations that extend far beyond selecting the correct answer from a predefined list. The gap between benchmark performance and clinical reality remains largely unexamined.

Topics

Health Law Advisor

The Evolution of Health Care AI Benchmarking

Search This Blog

Blog Editors

Recent Updates

Related Services

Topics

Archives

Epstein Becker Green Blogs

Subscribe

Privacy Preference Center

Strictly Necessary Cookies

Performance Cookies