Pages

Sunday, July 6, 2025

AI: augmenting the intelligence of family physicians

In a recent editorial, Dr. Joel Selanikio discussed how 24/7 access to generative artificial intelligence (AI) tools such as ChatGPT empowers patients to retrieve health information and self-manage low-acuity conditions that would have previously involved visiting a clinician. By embracing the capabilities of AI to reduce administrative burdens and improve clinical outcomes, Dr. Selanikio argued that practices can demonstrate “the unique and irreplaceable value doctors bring to health care.” Another opinion envisioned the rise of “AI-augmented generalists” who integrate the knowledge base of subspecialists and use large language models (LLMs) as “active cognitive collaborators.” New competencies required for the AI era include “AI system proficiency,” “collaborative problem-solving,” and “contextual adaptation.” Recently published and ongoing research provides several real-world examples.

A 2025 Graham Center Policy One-Pager synthesized information from online peer forums and vendor websites to compare costs and pros and cons of commercially available AI scribes. A study funded by the Agency for Healthcare Research and Quality is interviewing primary care clinicians and patients to identify barriers and facilitators to successful adoption of ambient digital scribe technology and to develop a prototype implementation guide for diverse primary care settings.

In addition to office notes, LLMs can be used to generate hospital discharge summaries. A study from the University of California, San Francisco, evaluated the accuracy and quality of LLM-generated discharge summaries for 100 randomly selected inpatient stays of 3 to 6 days’ duration. A team of blinded reviewers that included hospitalists, primary care physicians, and skilled nursing facility (SNF) physicians rated LLM and physician-authored summaries on comprehensiveness, concision, coherence, and errors (inaccuracies, omissions, and hallucinations). Overall, LLM narratives contained more errors but were rated as more concise and coherent than physician-generated narratives. Of note, primary care and SNF physicians—the end-users of discharge summaries—had more favorable views of LLM narratives than did hospitalists.

AI is being evaluated for its potential to assist clinical decision-making. In a single-center study of virtual urgent care visits for respiratory, urinary, vaginal, eye, or dental symptoms, AI-generated recommendations agreed with physician recommendations in 57% of cases and were more likely to be rated as optimal:

Our observations suggest that AI showed particular strength in adhering to clinical guidelines, recommending appropriate laboratory and imaging tests, and recommending necessary in-person referrals. It outperformed physicians in avoiding unjustified empirical treatments. … Conversely, physicians excelled in adapting to evolving or inconsistent patient narratives, … [and] also seemed to demonstrate better judgment in avoiding unnecessary ED referrals.

However, the AI in this study reported that it had insufficient confidence to provide a recommendation in 21% of cases.

Finally, a randomized trial examined the diagnostic accuracy of 50 US-licensed physicians who responded to clinical questions about a standardized chest pain video vignette featuring either a White male or Black female patient before and after receiving input from ChatGPT-4. This study showed that physicians were willing to modify their initial decisions based on suggestions from ChatGPT and that these changes led to improved accuracy without introducing or exacerbating demographic biases (eg, being less likely to diagnose the Black female patient with acute coronary syndrome).

**

This post first appeared on the AFP Community Blog.