Deploying Safe & Scalable AI in Healthcare: A Pear VC Event

The panelists brought incredible insights to an increasingly hot topic

Aug 28, 2025

The collision of policy, practice, and AI. Healthcare’s future sits at the crossroads of clinical judgment, incentives, and digital infrastructure.

Earlier this week, I made the trip from Tobacco Road in Durham, North Carolina all the way to Techy San Francisco (a city saved by yet another goldrush!). I had the chance to attend a Pear VC panel on “Deploying Safe & Scalable AI in Healthcare.” The conversation was refreshingly candid about what’s working and what isn’t as we move from flashy demos to durable impact. As a surgeon and technologist, I found the insights aligning with both my clinical intuition and my tech optimism. Today, I break down my key takeaways—ranging from the breakneck adoption of AI outpacing our literacy, to the hard reality of incentives and policy shaping what’s possible. The night underscored that truly transforming care with AI will require just as much innovation in training and workflows as in algorithm development itself.

Adoption Is Outpacing Literacy

AI is already inside our health systems, whether we’re ready or not. Physician uptake of AI has nearly doubled in a year—an AMA survey found 66% of doctors used some form of health AI in 2024 (up from 38% in 2023)[1]. Major centers like Stanford, UW, and UC San Diego have even piloted generative AI inside the EHR to help draft patient messages and documentation[2]. Yet most frontline clinicians haven’t been trained on when (and when not) to trust these tools. One panelist quipped that they have “never seen something tech integrate into medicine so quickly”—even unregulated large language models (LLMs) were rolled out via hospital pilots before many physicians knew basic pitfalls like hallucinated citations or fabricated answers. This gap between adoption and understanding is concerning. Dr. Daneshjou noted that a colleague of hers was shocked to learn that ChatGPT will confidently invent medical references out of thin air. And an even scarier anecdote: one patient, following advice from an AI assistant, ingested a bromide salt as a “low-sodium” alternative and ended up with bromide toxicity (a reminder that these models sound authoritative even when they’re dangerously wrong).

AI fluency needs to become a core clinical competency—no different from reading an EKG or interpreting an MRI. Clinicians should understand how these models work, their failure modes, and how to double-check critical outputs. There are encouraging signs that the medical community recognizes this. Surveys show physicians’ enthusiasm for AI rising, but also persistent concerns about accuracy, privacy and liability[3]. Medical organizations are calling for more oversight and training so that doctors can use AI safely. In the near future, being a great doctor will likely require being AI-literate: able to harness tools for translation, drafting, or data lookup, but also knowing their limits. Just as we wouldn’t hand out scalpels without surgical training, we shouldn’t deploy AI without preparing clinicians. Otherwise, we risk the worst of both worlds: rapid adoption without understanding, and mistakes that erode trust. The message was clear: The tech is here; the training must catch up.

Benchmarks Are Not the Bedside

Another theme echoed by Stanford’s Dr. Roxana Daneshjou was that benchmarks aren’t bedside. An AI passing a medical exam or dataset benchmark tells us very little about real-world performance – safety, workflow impact, or downstream utilization. Medicine is rife with examples of treatments that looked great on paper but flopped in practice. AI will be no different. Dr. Daneshjou pointed out that many AI models touted as “expert-level” are evaluated on static test sets or synthetic tasks. A model acing multiple-choice questions or generating neat progress notes might still falter in the messy context of clinical care, even increasing appointment lengths, generating extraneous follow-up visits, or overwhelming providers with alerts and messages. The true standard should be trials in live clinical workflows, with humans in the loop and measurement of unintended effects.

Panelists touched on one pioneering example of such a real-world evaluation: a collaboration between OpenAI and Penda Health in Kenya. In that study – which Dr. Graham Walker broke down in granular detail – an LLM-based “clinical co-pilot” was tested in nearly 40,000 live patient visits[4]. Half the clinicians had access to an AI assistant that reviewed their notes and flagged potential issues (using a traffic-light system for level of concern), while half practiced as usual. The results were intriguing, and I wrote about them in another article. The AI significantly reduced medical errors: a 32% drop in history-taking mistakes, 16% drop in diagnostic errors, and 14% drop in treatment errors[5]. Importantly, no safety harms were identified, and 100% of surveyed clinicians who used it found it helpful[6]. This suggests a meaningful improvement in care quality in a real clinic.

However, the study also looked at operational impacts. One finding was that using the AI assistant slightly lengthened visits (perhaps as doctors took time to read AI suggestions) and in some cases led to additional visits. This underscores Dr. Daneshjou’s point: we must evaluate AI in context, measuring not just immediate accuracy but knock-on effects like visit duration, follow-up rates, or message volume. Panelists also touched on benchmarking hype – OpenAI famously boasted GPT-4 could pass the USMLE medical licensing exam, but, but an exam score doesn’t guarantee sound clinical reasoning or patient trust.[7]

Going forward, I believe any AI tool touching patient care should voluntarily undergo trials (observational or otherwise) that look a lot more like a new drug or device study: test it with real patients and clinicians, compare outcomes (including unintended ones), iterate with human feedback, and publish the findings. Benchmarks will remain useful for model development, but the bedside is the only place that counts. It was encouraging to hear leading researchers call for this rigor. As a clinician-founder, I’m taking that to heart: the only AI that matters is AI that actually improves care in practice.

Augmented intelligence, or just augmented headaches?

Incentives > Algorithms: The Payment Paradox

One of the most provocative insights was that in healthcare, incentives matter more than algorithms. We often assume a better tool will naturally win, but if the economics are misaligned, even the best AI can wither on the vine. Nowhere is this clearer than with AI that improves efficiency. In today’s predominant fee-for-service (FFS) payment model, a tool that makes care more efficient – say an AI that handles routine patient check-ins or titrates a medication without a visit – can paradoxically reduce revenue for providers. As Dr. Sharif Vakili (UpDoc’s CEO) noted, many billing codes for outpatient care are time-based or visit-based. If an AI cuts a 30-minute task to 5 minutes, a clinic might actually earn less under current billing. Similarly, asynchronous “agentic” tools (ones that operate between visits, like automated follow-ups or monitoring) often aren’t reimbursed at all, or only reimbursed for the time a physician spends. In FFS, time is money – literally. So a technology that saves time might save costs for the system or improve patient throughput, but from the provider’s perspective it can shrink the bottom line. No surprise then that hospitals and practices can be lukewarm on tools that, in another industry, would be no-brainers for efficiency.

How do we resolve this paradox? Change the incentives. Several panelists argued that if we want autonomous titration, outreach, and care navigation at scale, we must pay for outcomes or outputs, not hours. In value-based care models (like capitation or bundled payments), AI that prevents a hospitalization or streamlines a chronic disease check-in is a financial win, not a loss. It was no coincidence that one panelist, ER physician Dr. Graham Walker from Kaiser Permanente, noted: “This is why I work at Kaiser – in our system, if AI makes patients healthier and keeps them out of the ER, we actually win too.” In a value-based context, productivity gains and avoidance of unnecessary visits align with revenue and quality goals. The good news is that policy is trending in this direction. Medicare and other payers are doubling down on value-based models, pulling specialists and hospitals into the fold. For example, CMS is launching the mandatory Transforming Episode Accountability Model (TEAM) in 2026, which will make certain hospitals responsible for the total cost and quality of surgical episodes (from the operation through 30 days post-discharge)[8][9]. If an AI can reduce complications or readmissions in that 30-day window, the hospital keeps the savings – a direct incentive for tech that improves outcomes. Likewise, CMS just proposed an Ambulatory Specialty Model (ASM) to bring specialists into value-based care for chronic conditions like heart failure and back pain[10][11]. ASM will be mandatory in selected regions and CMS continues to reiterate that they aim to have 100% of traditional Medicare in some accountable care model by 2030[12]. This is a clear signal: the future is paying for results, not volume.

For startups and innovators, this means our AI solutions should target the coming world of value-based care. Design your ROI case around outcomes (reduced ER visits, better blood pressure control, fewer no-shows) rather than just time saved. Already, we see more interest from payers and risk-bearing providers in AI that drives measurable clinical improvements. It’s telling that UpDoc – which built an AI agent to manage medications remotely – chose to pursue an FDA clearance and partner with value-based systems, essentially creating a “revenue model” for their AI via shared savings or quality bonuses. Smarter reimbursement (from Medicare, insurers, and employers) is needed to unlock the full potential of AI in reducing inefficiencies. The panelists’ consensus was that we have plenty of clever tech; what we need now is equally clever payment reform to let it shine.

Physicians are already using AI, but outcomes depend on training, real-world testing, and payment models like TEAM and ASM that reward efficiency and results.

EHR Gravity and Choosing Your Lane

Of course, no health tech discussion is complete without addressing the elephant in the room: the EHR monoliths, namely, Epic. The gravity of a dominant electronic health record can pull even the best-intentioned innovation into its orbit – or crush it outright. The panel was frank about this. Epic’s footprint is so massive (nearly 33% of U.S. hospitals and growing[13], and famously they’ve never lost a client in almost 50 years) that any startup operating in the same space must strategize carefully. We discussed how Epic has already bundled a basic ambient note scribe into its offerings, possibly leveraging learnings from early partnerships with vendors to develop its own version. This raises a tough question for startups: bundle with the empire, or rebel at the edges? Health tech companies essentially have to choose their lane. One lane is working with the big EHRs – integrating deeply, selling to enterprises as part of a larger package, and hoping to stay a step ahead of the platform’s native features. The other lane is going outside the EHR’s core, either by focusing on best-in-class workflow solutions that clinicians (not CIOs) directly adopt, or by targeting segments the big players underserve (for instance, small independent practices, specialty workflows, or patient-facing apps).

Erez Druk, CEO of the ambient scribe company Freed, made a compelling case for the second lane – a “best-of-breed” approach where you treat the clinician as the true customer. His philosophy is that since “the clinician is the heart of healthcare,” tools that give time back to physicians and improve their day will win, even if the enterprise backdrop is consolidation. Freed’s strategy has been to go bottom-up: individual doctors and clinics choose their product because it meaningfully reduces burnout (fewer late nights charting, more attention to patients). And indeed, the whole wave of AI scribes (including Freed) caught on first in smaller practices and departments before hospital CIOs took note. This bottom-up approach can still succeed – if the target users have decision-making power. Here’s the rub: trends in the industry do worry me on this front. Independent physician practices are dwindling, as more doctors become employees of large systems. Between 2012 and 2024, the share of U.S. physicians in private practice dropped from about 60% to 42%[14]. That means fewer doctors can independently purchase or adopt a new solution; instead, their large organization’s procurement (often influenced by the EHR vendor’s ecosystem) calls the shots. So while there’s “massive room” for excellent point solutions, as we all agreed, that room might be slowly shrinking in certain markets.

Where does this leave an innovator? I think it means you need a dual strategy. First, narrow in on a specific workflow problem where you can be 10x better than the status quo – something even a skeptical clinician will pay for or beg their admin to buy. (For Freed it was dictating a note while maintaining eye contact with the patient; for others it might be AI-assisted coding, or automating prior auth, etc.) Second, design for interoperability and integration from day one, because if you do win, the users will eventually demand it play nice with the EHR. There was optimism on the panel that regulatory pressure (like US interoperability rules) might prevent total walled gardens. And we noted that even Epic can’t build everything at top quality – there will always be room for best-in-class add-ons, especially in niches. Finally, one astute comment was to avoid becoming just a “feature” that the platform can replicate. If your value is truly novel data or network effects or proprietary insights, you’re harder to copy. In summary, the EHR giants exert strong gravity, but agile startups can still slingshot around them by picking the right angle of attack. Enterprise bundling will continue, but I share the view that there’s room yet for independent solutions that deliver clear workflow wins – especially those that actually give clinicians time back rather than just adding another click.

Health tech startups must decide whether to orbit within Epic’s gravity or carve out their own path at the periphery.

The Clinician Remains Captain (Augmented > Artificial)

Amid all the AI exuberance, the panelists agreed on one reassuring point: the doctor (or nurse) remains the captain of the ship – and in the near term, “augmented” will trump “artificial.” The vision for the next few years is not robot doctors running around on their own; it’s “agentic auxiliaries” (as Dr. Vakili nicely phrased it) assisting clinicians under close oversight. Think AI copilots that draft notes, suggest orders, or double-check plans, all under a licensed provider’s supervision. This model acknowledges both the current limitations of AI and the ethical/legal framework of medicine. If an AI makes a mistake, the human professional is there as a safety net, with auditability and the ability to revert or correct. Several panelists emphasized that this setup isn’t just a concession to AI’s flaws – it’s actually the optimal way to improve AI. By pairing AI with human judgment, the AI can learn from real corrections, and clinicians can progressively trust the AI with more tasks once it has proven itself. It’s an incremental approach to autonomy, akin to how we train resident physicians: increase responsibility as competence is demonstrated, not all at once.

Concrete examples of this augmented approach are already emerging. Dr. Vakili’s startup UpDoc recently unveiled an AI assistant for managing chronic medications like insulin. It doesn’t replace the doctor – rather, it acts like a “physician extender” that can call patients, adjust doses according to protocols, order labs, and so on, but only within guardrails set by the physician[15][16]. Early results are striking. In a controlled trial at Stanford, UpDoc’s AI-managed diabetes care (with oversight) more than doubled the rate of patients achieving blood sugar control (81% vs 25% in usual care), while requiring far fewer clinic visits[17]. The AI agent made five times more medication adjustments than typically possible in standard care, yet because it was continuously monitored and safe limits were in place, outcomes improved without compromising safety. This kind of success hints at what responsibly-designed autonomy can do: tackle the gaps traditional care leaves (like the titration of meds between appointments) and deliver better results, as an extension of the care team. Notably, every intervention by the AI in that trial was prescribed by a physician and could be overridden at any time[18]. The clinician remained firmly in charge, and arguably that was key to both the ethical acceptability and the positive outcomes.

Across the industry, we see this philosophy gaining ground. The American College of Physicians has outright stated that AI should augment, not replace, physician decision-making, and called for regulations to ensure clinicians are involved in AI-driven care. Even the big tech companies have tempered their messaging: Microsoft, Google, OpenAI – all are now talking about “copilots” rather than autonomous doctors. My view is that for the foreseeable future, “AI in healthcare” will mean human-centered AI. The best systems will feel less like a black box replacement and more like an efficient junior colleague: always available, extremely well-read, great at routine tasks, but checking in with the boss (the clinician) for tough calls. This augmented model not only keeps patients safer, it also aligns with clinicians’ desire to maintain the human touch. As one AMA leader recently put it, “There will continue to be a human component to medicine which cannot be replaced. AI is best optimized when it’s designed to leverage physicians, not sideline them”[19][20]. I left the panel feeling excited that we can thread this needle: we can have significant automation and keep clinicians as the ultimate authority. In doing so, we protect the patient-doctor relationship and build trust in these tools the right way. After all, “artificial” intelligence will come and go, but augmented intelligence** – the pairing of human empathy and AI efficiency – is a formula as old as medicine itself (just ask any physician who relies on decision support or a trusty reference book).

The near-term future is one where physicians remain the captains, supported by agentic auxiliaries—AI tools with guardrails, auditability, and human-in-the-loop design.

Mapping AI to Population Health (My Playbook)

As a founder in the musculoskeletal (MSK) and population health space, I’m constantly thinking how these high-level insights translate to day-to-day care for specific patient populations. Hearing the panel’s perspectives helped validate and refine my own approach at RevelAI Health. Our mission is to provide high-touch, AI-enabled asynchronous care coordination that closes care gaps without adding burden (no extra clicks!) for clinicians. Concretely, this means using AI agents to do things like follow up with patients after hospitalizations, monitor symptoms or therapy progress remotely, and nudge patients about preventive care – all while looping in human providers when needed. The aim is to extend care between visits in a scalable way, thereby improving outcomes in areas like post-surgery recovery and chronic disease management. Crucially, we design these AI-driven workflows to slot into existing team routines, not create new ones. For example, if an AI assistant contacts a patient and learns their knee pain is worsening, it can alert the care team through the existing EHR inbox with a concise summary, rather than making the clinician learn a new app.

Our ROI is outcome-oriented. Rather than bragging about fewer minutes charting (which, as discussed, doesn’t excite a fee-for-service CFO), we focus on metrics that payers and value-based providers care about: reduced readmissions and ED revisits, faster primary care follow-ups after discharges, completion of key quality measures, and closed-loop referrals for social needs. These translate to better patient health and financial savings under the right payment models. For instance, it’s well documented that timely primary care follow-up can prevent readmissions[21]. So our AI might ensure every high-risk patient leaving the hospital has a follow-up booked within 7 days, and even call the patient with reminders or arrange transport if needed. If that reduces 30-day readmissions by even a fraction, the health system avoids huge penalties or costs – a clear win-win. Similarly, by automating the gathering of quality data (e.g. ensuring a diabetic patient gets their eye exam or a depression screening is done), we can help clinics hit their value-based care targets without drowning staff in paperwork. One of our pilots, for example, uses an AI outreach to screen patients for social determinants of health (like food or housing insecurity) and then connects high-risk patients to resources. Closing these social care loops can improve long-term outcomes, and in capitated models it’s worth the investment upfront.

Finally, consistent with everything said above, our model relies on clinician-in-the-loop agents with clear guardrails and escalation paths. Our care navigation AI is not a free-roaming bot – it operates within protocols set by physicians and escalates to a human whenever it encounters something outside the norm (e.g. symptom severity beyond a threshold, or patient confusion). We also evaluate these interventions rigorously in the real world. Did our joint replacement patients who got AI-guided rehab calls have fewer complications? Did our chronic back pain patients report better pain control or satisfaction? We measure and iterate, holding ourselves to the standard of evidence that the panel championed. In short, we’re building what the night’s discussion affirmed: high-tech solutions that align with high-value care. By keeping incentives, safety, and human factors in alignment, we believe AI can truly move the needle in population health – giving patients more support between visits, giving clinicians superpowers to manage larger panels, and giving health systems ROI in the metrics that matter.

Many thanks to the organizers at Pear VC and the fantastic panelists for sparking these ideas. It’s an exciting time to be working at the intersection of clinical medicine and AI. The takeaway for me was clear: Success will require balancing innovation with education, aligning incentives with outcomes, and keeping the human clinician at the center of the design. If we can do that, the technology and scalability will follow. I’m more inspired than ever to keep building – with safety, incentives, and impact aligned.

Thank you for reading. 😊

Sources: Pear VC Event Panel (Aug 26, 2025); AMA 2024 AI Survey[1][3]; FierceHealthcare (Epic/Microsoft integration)[2]; OpenAI+Penda Health study[4][5]; CMS Innovation Center (TEAM & ASM models)[22][12]; AMA/Healthcaredive (Physician practice trends)[14]; UpDoc RCT results (JAMA Netw Open)[17]; JAMA study on follow-up & readmissions[21]; ACP/AMA on augmented intelligence[19][20].

[1] [3] 2 in 3 physicians are using health AI—up 78% from 2023 | American Medical Association

https://www.ama-assn.org/practice-management/digital-health/2-3-physicians-are-using-health-ai-78-2023

[2] [13] HIMSS23: Epic, Microsoft integrate generative AI into EHRs

https://www.fiercehealthcare.com/health-tech/himss23-epic-taps-microsoft-integrate-generative-ai-ehrs-stanford-uc-san-diego-early

[4] [5] [6] OpenAI+Penda study | NU PCCM Fellowship Blog

https://sites.northwestern.edu/pccmfellowship/2025/08/11/openaipenda-study/

[7] Why generative AI like ChatGPT cannot replace physicians | American Medical Association

https://www.ama-assn.org/practice-management/digital-health/why-generative-ai-chatgpt-cannot-replace-physicians

[8] [9] Transforming Episode Accountability Model (TEAM) | CMS

https://www.cms.gov/priorities/innovation/innovation-models/team-model

[10] [11] [12] [22] CMS Proposes Mandatory Ambulatory Specialty Model to Advance Value-Based Care for Chronic Conditions | Epstein Becker Green

https://www.healthlawadvisor.com/cms-proposes-mandatory-ambulatory-specialty-model-to-advance-value-based-care-for-chronic-conditions

[14] Physician Practice Benchmark Survey | American Medical Association

https://www.ama-assn.org/about/ama-research/physician-practice-benchmark-survey

[15] [16] [17] [18] UpDoc Debuts the World's First AI Assistant That Manages Medication Prescriptions and Chronic Conditions

https://www.prnewswire.com/news-releases/updoc-debuts-the-worlds-first-ai-assistant-that-manages-medication-prescriptions-and-chronic-conditions-302027175.html

[19] AI's role in health care: Supporting, not replacing, physicians

https://permanente.org/ais-role-in-health-care-supporting-not-replacing-physicians/

[20] Why health care AI can't replace medicine's human component

https://www.ama-assn.org/practice-management/digital-health/why-health-care-ai-can-t-replace-medicine-s-human-component

[21] Primary Care Physician Follow-Up and 30-Day Readmission After ...

https://jamanetwork.com/journals/jamasurgery/fullarticle/2809996

Ryan Galitzdorfer

Really sharp breakdown here. The value-based model makes sense, but the bigger question is who gets to define what counts as “valuable.” That definition alone could reshape not just the limits we accept in healthcare today, but also what we imagine it could accomplish tomorrow.

Expand full comment

Techy Surgeon

Discussion about this post