Is AI solving the wrong problems in medicine?
You've probably heard a million times that AI is going to transform medicine. Maybe it’s going to discover the next blockbuster drug. Or replace your doctor. Or be the ambient scribe that's going to end physician burnout once and for all.
There’s an old military saying that amateurs talk strategy but professionals talk logistics.
Too much coverage that I see about AI in medicine is about strategy. So let’s talk today about the logistics: how AI is being used in the real world. The road to an AI revolution in medicine is filled with challenges about how we actually implement these tools.
Because that’s what AI is: a tool. And a tool is only as good as how we use it.
Pancreatic cancer detection is exciting, but incredibly preliminary
I really enjoy Derek Thompson’s coverage of many topics, because he tries to think through the question of “how does this work in the actual world?” rather than the pie-in-the-sky way that too many commentators look at new technology.
And so I would recommend everyone read his interview with Dr. Ajit Guenka, the senior author of the paper on an AI tool called REDMOD finding pancreatic cancer on CT scans years before radiologists did.
Here’s one of the graphics from the paper - panel A is a scan with a “normal” pancreas taken 2.4 years before panel B, which shows the same patient after development of pancreatic cancer. Panel C shows how the REDMOD tool flagged the original scan for being at risk for pancreatic cancer:
It’s exciting but preliminary. When you look at the test characteristics, you see that REDMOD performs well but not perfectly.
It didn’t pick up all cancers - it found about 73% of them.1 To put this into perspective, it’s 2-3 fold better than humans.
It flags some normal scans as being abnormal - specificity was 81%, which means that 19% of the scans flagged as abnormal were actually normal.
If you a deploy a tool like this at scale, that’s a lot of false positives.
And the most important question is about the clinical applicability. What do you do when a patient screens positive? Should they get more imaging? A biopsy? A big surgery to remove the concerning area?
Detection without a plan is an incomplete solution.
And I have even more questions beyond that?
How do you figure out who should be screened?
How frequently should someone high risk with a negative screen be followed?
Does deploying these tools save lives?
Does it lead to a bunch of unnecessary surgeries?2
How much stress does it cause the people who get flagged with an abnormal scan but end up never developing cancer?
But beyond even all of those questions about how you operationalize the management, what type of evidence are we going to require before we deploy a tool like this?
It’s one thing to have a tool that can detect cancer early, but quite another thing to show that detecting that cancer early lets us save lives since detection doesn’t matter unless we have a plan for treatment. The question of whether flagging patients early actually improves outcomes is one that a paper like this can’t answer.
And understanding what to do with that information is the next urgent question.
We need to be testing tools like this in clinical trials to answer those important questions about how to follow these patients, who goes for a biopsy or surgery, who doesn’t need to worry.
As I’ve written before, the AI revolution in medicine often has an evidence problem that is really underappreciated in much of the discussion I hear about with these tools.
The underwhelming AI scribes solve fake problems but not real or important ones
One of the most rapid uptake of AI tools in medicine has been the advent of AI scribes to help with charting.
I’ve had a chance to use these in clinic, and I’ve come away unimpressed.
They don’t help with the most annoying parts of charting - clicking diagnoses, updating allergies, doing a medication reconciliation, taking out old problems, updating the family history, putting in orders, the aggravating lag that happens when you move between tabs in Epic.
And they also add AI slop into the note - one example: when a patient asks me about how I’m doing, it puts my answer into the note because it doesn’t always distinguish who is giving the right information.
And beyond that, the scribe often puts too much information in the chart or leaves out things that I think are important.
It needs to be proofread - a time consuming process! - or your note ends up with more slop.
The end result is that if you really want to use an AI note and make sure it’s accurate and complete, you spend the same amount of time (or more) as when you write the note yourself.
And if you are using dictation already (which I think is the single most productivity-enhancing trick I’ve ever been taught in charting), AI scribes may actually slow you down.
Totally underwhelming, not time saving, and most importantly - because it doesn’t help with the most aggravating parts of working in the electronic medical record - certainly not burnout reducing.3
My overall experience: an unimpressive solution to the wrong problem.
The Prior Authorization Arms Race: When the AIs Start Talking to Each Other
Prior authorization is one of the weirdest terms in medicine.
It means “permission from the insurance company to do this medical thing.”
Insurance companies are deploying AI to process prior authorization requests faster and at greater scale.
That adds additional imperfection to an already imperfect system.
The natural response for hospitals and physician groups is going to be to use AI to fight back.
I anticipate that this arms race means that we’ll end up with a bizarre back and forth - two large language models essentially corresponding with each other, generating enormous volumes of clinical-sounding documentation that no human wrote, few humans will read, and no single human is accountable for.
But the end result of that process is going to impact whether actual humans get medical tests, receive prescriptions, and have procedures in the real world.
The economist Tyler Cowen has written about AI agents talking to each other and building their own culture - systems talking to systems, generating outputs in a space humans increasingly don’t inhabit.
Will prior authorization be medicine’s version of that?
Maybe, but in this case the stakes won’t be about crappy cultural consumption.
Everyone may get more prolific at generating paperwork. Total administrative burden for all the humans involved will almost certainly go up, not down.
And patients, who didn’t ask for any of this, will get caught in the middle.
The future here is bizarre - we’ll have to see what happens when AI gets deployed in an administrative arms race.
Drug Development: More Targets, Same Bottleneck
I can’t tell you how many stories I’ve read about AI revolutionizing drug development because it’s helping to identify molecular targets better.
That’s probably true, but I don’t think that’s where the real problem in drug development lies.
Too often drug development fails at the back end, not the front. The rate-limiting step is the clinical trial - figuring out whether something that works in a lab or a mouse actually works in a human is hard!
That requires trials, which require time, patients, and navigating a complicated regulatory process. AI doesn’t meaningfully speed up any of that. What it does is widen the front of the pipeline while the back stays just as narrow.
You need both mechanisms and outcomes to have drugs that actually help people.
The history of medicine is littered with so many things that should work based on a sophisticated or elegant mechanism that didn’t work when they were rigorously tested in a trial.
Mechanisms are hypotheses about how things work. Outcomes tell us whether the mechanisms are accurate and complete enough to help people.
Is AI just adding more candidates to test but not speeding up the testing process?
If I’m going to get excited about AI in drug development, I want to see it compress the back end. Enable easier data sharing, speed up the bureaucracy to make it faster and cheaper to run trials.
I’d be particularly interested in using AI to help with strategies for common problems - how do we figure out which blood pressure medication will work for which patient, or which statin will give the biggest LDL benefit with the fewest side effects, or which blood thinner works best for this patient?
Improving outcomes is the hard part. Identifying more targets to study isn’t.
AI chatbots aren’t ready for prime time
The idea of an AI that helps patients figure out what’s wrong with them sounds useful.
In practice it runs into a fundamental problem that good diagnosis depends on knowing which questions to ask, not just processing the answers patients give you.
Patients are often ineffective narrators about things that matter for diagnostic purposes. I mean this with zero judgment - it’s not a patient’s job to know what details about chest pain matter.4
Read the piece below from Bobby Dubois MD, PhD on his experience with using AI to diagnose a headache. He invented symptoms and asked ChatGPT what to do - he got a list, but didn’t get asked any follow up questions. When he didn’t volunteer the right information, he got a totally incorrect recommendation to lie down in a dark room.
My experience has been similar - when patients bring information from ChatGPT, the signal-to-noise ratio isn’t great. Knowledge is there, but clinical reasoning with uncertainty isn’t.
Far too often, large language models miss the point in a clinical interaction.
Sometimes this is because they tell patients what they want to hear, sometimes this is because they don’t ask the right questions (or any questions at all), and sometimes it’s because the chatbot doesn’t have all of the right information to contextualize the claim.
The messy business of AI implementation is really important
The questions about these tools outnumber the answers by a lot. I have genuine excitement about what AI might eventually do in medicine. I also have genuine concern that we're moving fast and demanding very little evidence along the way.
The military saying goes: amateurs talk strategy, professionals talk logistics. The strategy for AI in medicine sounds great. The logistics - who screens positive on REDMOD and then what, which parts of the chart actually get better, who’s accountable when two AIs negotiate a prior auth denial - are still mostly unsolved.
That’s what’s worth paying attention to as you see more in the news about AI in medicine.
It found 13/19 cancer more than 2 years before a diagnosis, and 15/20 cancers between 3 and 12 months before the diagnosis. Overall sensitivity - ability to detect a cancer present - was 73%.
The surgery for pancreatic cancer is a big surgery that has real risks associated with it.
And that doesn’t even include the critiques raised by people like James H. Stein, MD about how AI scribes may worsen burnout and paradoxically lead to more physician demands.
I’ve written in detail about why your doctor keeps interrupting you before:





Lay person here - I read a fair amount about AI and this is one of the best articles I’ve read because it is grounded in reality. I saw the stories about the pancreatic cancer imaging and was very impressed and excited for the possibilities. Dr Katz brings us back to reality. For the sake of patient care I hope we don’t get ahead of ourselves with AI implementation and realize in a few years we way overbought the hype.
As a physician with over 40 years of clinical experience, I'm not worried about AI taking my job. When we evaluate patients, we listen to their tone of voice and monitor their facial expressions and body language to assess what's going on. These subtle signs are often more important than the patient's history. AI simply can't do any of this. AI is helpful for charting and other routine activities, but leave the diagnosis and treatment to us.