MedGemma 1.5 and MedASR for Doctors: What Healthcare Professionals Need to Know
- Anirudh Singh Chauhan
- May 8
- 8 min read

Healthcare AI is no longer a future concept — it is actively reshaping how clinicians work with imaging studies, patient records, and clinical documentation. Google's latest release, MedGemma 1.5, is one of the most significant open-source medical AI updates in recent months, and it comes with a companion model, MedASR, purpose-built for medical speech recognition.
If you read our earlier blog on running the original MedGemma locally via LM Studio, this post is the natural follow-up. Here we break down exactly what is new, what the benchmarks mean for clinical workflows, and how doctors and healthcare developers can get started with the updated models.
Why Specialized Medical AI Matters More Than General-Purpose Models
General-purpose AI can handle a broad range of questions. But medicine does not work with general-purpose data. Clinicians work with DICOM images, multi-slice CT volumes, whole-slide pathology scans, serial chest X-rays taken months apart, and dictated notes full of specialized terminology that a generic speech model will inevitably mishandle.
This is precisely the gap that MedGemma 1.5 and MedASR are designed to fill. Rather than asking a general AI to approximate medical reasoning, these models are trained on de-identified medical data including chest X-rays, dermatology images, histopathology slides, CT scans, MRI volumes, and electronic health records. The healthcare industry is already adopting AI at twice the rate of the broader economy — and the demand for tools that work natively with clinical data is accelerating.
What Is New in MedGemma 1.5? A Full Capability Breakdown
MedGemma 1.5 is not a minor incremental update. It introduces fundamentally new imaging modalities, significant accuracy improvements across the board, and a new structured document extraction capability. Here is what changed:
1. 3D Volumetric Imaging: CT Scans and MRI
This is the headline upgrade. The original MedGemma worked with single 2D images. MedGemma 1.5 can now accept multiple slices — the way CT and MRI data actually exists in clinical practice — and reason across the volume as a whole.
In internal benchmarks, absolute accuracy on CT classification improved from 58% to 61% (+3%), while MRI classification saw a much larger jump from 51% to 65% (+14%). For a 4-billion-parameter model running locally, these figures are remarkable — especially when compared to proprietary cloud-based models with over a trillion parameters.
Developers can explore CT and histopathology workflows using official tutorial notebooks available on the MedGemma GitHub repository.

2. Whole-Slide Histopathology — From Patches to Full Slides
MedGemma 1 could analyze small pathology patches. Version 1.5 now supports full whole-slide images, the kind of complete pathology scans that pathologists actually review.
The ROUGE-L score for histopathology — a measure of how accurately the model describes slide findings — jumped from 0.02 to 0.49, practically matching the 0.498 score of PolyPath, a model built specifically for this task. That improvement from near-zero to competitive performance in a single version update is substantial.
3. Anatomical Localization in Chest X-Rays
MedGemma 1.5 can now identify and locate specific anatomical structures within a chest X-ray — pinpointing where findings are, not just what they might be. This is directly relevant to radiology reporting and AI-assisted triaging workflows.
Benchmark performance on the Chest ImaGenome dataset improved from 3% to 38% intersection over union — a 35 percentage point jump that brings this capability from essentially non-functional to genuinely useful.

4. Longitudinal Chest X-Ray Comparison
One of the most clinically valuable capabilities in the updated model is time-series imaging review — comparing a patient's current chest X-ray against one taken months or years earlier to assess disease progression, treatment response, or new findings.
Accuracy on the MS-CXR-T longitudinal benchmark improved from 61% to 66%. In a live demonstration, the model successfully identified a pulmonary nodule that had grown between two X-rays taken six months apart, recommending further investigation with CT and potential biopsy.

5. Structured Lab Report Data Extraction
Beyond imaging, MedGemma 1.5 also improves its ability to extract structured information from clinical documents such as lab reports — pulling out test names, values, and units from both typed and handwritten reports.
The retrieval macro F1 score on lab report extraction improved from 60% to 78% (+18%). For healthcare data teams, administrative staff, and hospital IT departments, this is one of the most operationally relevant improvements in the update.

6. Improved Medical Text and EHR Reasoning
Even on pure text tasks, MedGemma 1.5 outperforms its predecessor. MedQA accuracy improved from 64% to 69% — comparable to what you would expect from a medical licensing exam preparation benchmark. More significantly, performance on EHR question-answering (EHRQA) jumped from 68% to 90% (+22%), which has direct implications for clinical documentation retrieval and secondary use of health records.
MedGemma 1 vs MedGemma 1.5 — Performance at a Glance
Feature / Benchmark | MedGemma 1 (4B) | MedGemma 1.5 (4B) |
CT Image Accuracy | 58% | 61% (+3%) |
MRI Image Accuracy | 51% | 65% (+14%) |
Histopathology (ROUGE-L) | 0.02 | 0.49 (+0.47) |
Anatomical Localization (IoU) | 3% | 38% (+35%) |
Longitudinal CXR Accuracy | 61% | 66% (+5%) |
Lab Report Extraction (F1) | 60% | 78% (+18%) |
MedQA (Text Reasoning) | 64% | 69% (+5%) |
EHR Q&A (EHRQA) | 68% | 90% (+22%) |
3D Volumetric (CT/MRI) Support | ❌ Not available | ✅ Supported |
Whole-Slide Histopathology | Patch-level only | ✅ Full slides |
Medical Speech (MedASR) | ❌ Not available | ✅ New model |
DICOM Support (Cloud) | Limited | ✅ Full support |
Introducing MedASR: Medical Speech-to-Text Built for Clinical Vocabulary
Running alongside MedGemma 1.5, Google has also released MedASR — an automated speech recognition model fine-tuned specifically for medical dictation. This is a separate model, but it is designed to pair naturally with MedGemma for end-to-end voice-driven clinical workflows.

Why General ASR Falls Short in Healthcare
Generic speech recognition tools like OpenAI's Whisper are powerful for everyday language, but medical dictation is a different domain. Medication names, anatomical terms, procedural terminology, and specialty-specific vocabulary create a high error surface for models not trained on clinical speech.
How Much Better is MedASR?
The numbers from head-to-head benchmarking are striking:
Chest X-ray dictation word error rate: 5.2% (MedASR) vs 12.5% (Whisper) — 58% fewer errors
Diverse multi-specialty dictation: 5.2% (MedASR) vs 28.2% (Whisper) — 82% fewer errors
That 82% reduction in errors on multi-specialty dictation is the figure that matters most for clinical use. Fewer errors in transcribed notes means fewer downstream corrections, less administrative burden, and more reliable data for decision-making.
Two Ways to Use MedASR
As a transcription tool: Convert medical dictation into accurate clinical notes. Especially useful for radiologists, pathologists, and any specialty with high dictation volume.
As a voice interface for MedGemma: Speak your clinical query, MedASR transcribes it, MedGemma reasons over the result. This creates a more natural hands-free workflow for busy clinicians.

How to Access and Run MedGemma 1.5
There are three primary ways to work with MedGemma 1.5, depending on your technical environment and data privacy requirements:
Option 1: Google Colab (Easiest Starting Point)
For doctors, researchers, and healthcare AI enthusiasts who want to experiment without local setup, Google Colab is the fastest route.
Go to colab.research.google.com and create a new notebook
Change runtime type to T4 GPU (Runtime > Change runtime type)
Install dependencies: pip install torch torchvision transformers
Authenticate with your Hugging Face token (free account required — MedGemma is a gated model)
Load the model using the transformers pipeline function
Pass medical images or text as prompts to begin testing
The model download requires approximately 8–10 GB and may take a few minutes on first run. Tutorial notebooks for all major use cases (CT, histopathology, anatomical localization, longitudinal CXR, lab extraction) are available on the MedGemma GitHub repository.

Option 2: Local Deployment via LM Studio (Best for Data Privacy)
For healthcare professionals who need patient data to remain entirely on their own machine — the most important consideration for clinical environments — LM Studio provides a no-code interface for running MedGemma locally.
We covered this setup in detail in our original MedGemma blog. The same process applies for version 1.5 once GGUF-format versions become available in the LM Studio community on Hugging Face. Key hardware requirements are summarized in the table below.
Model Variant | CPU Cores | RAM | GPU VRAM | Storage |
MedGemma 1.5 4B (Multimodal) | 4+ | 8–16 GB | 6 GB VRAM+ | ~10 GB |
MedGemma 1 27B (Text-Only) | 8+ | 32–48 GB | 12 GB VRAM+ | ~28 GB |
MedGemma 1 27B (Multimodal) | 8+ | 48–64 GB | 24 GB VRAM+ | ~35 GB |
Google Colab (Free) | — | 12 GB (T4) | T4 (16 GB) | Cloud-based |
Option 3: Google Vertex AI / Model Garden (For Enterprise & Teams)
For hospital IT teams, health-tech startups, and developers building scalable applications, Google Vertex AI now includes full DICOM support for MedGemma — making it significantly easier to integrate into existing radiology and clinical imaging pipelines. MedASR is also available on Vertex AI alongside Hugging Face.
MedGemma 1.5 Already in Use: Real-World Healthcare Applications
These models are not just theoretical. Developers and healthcare organizations are already building on them:
Qmed Asia (Malaysia): Adapted MedGemma to build askCPG, a conversational interface to Malaysia's 150+ clinical practice guidelines. The Ministry of Health Malaysia noted that the tool has made navigating clinical decision support more practical in day-to-day use.
Taiwan National Health Insurance Administration: Applied MedGemma to extract structured data from over 30,000 pathology reports to inform surgical policy for lung cancer resection decisions.
Academic Research: MedGemma has been cited in published research comparing it favorably to other base models for medical text understanding, multidisciplinary team decision making, and mammography reporting.
Google has also announced the MedGemma Impact Challenge — a Kaggle-hosted hackathon with $100,000 in prizes, open to all developers who want to build the next generation of healthcare AI tools on top of MedGemma and the Health AI Developer Foundations ecosystem.
What Doctors Must Know: Clinical Caution Is Non-Negotiable
Disclaimer: MedGemma 1.5 and MedASR are foundational tools for researchers and developers. They are not cleared for direct clinical use and must not replace professional clinical judgment. All outputs require independent verification, clinical correlation, and rigorous validation before any operational deployment. |
That framing is important and consistent with every serious discussion of these tools. A compelling benchmark is not the same as validated clinical performance across real-world populations, devices, imaging equipment, and care settings.
The responsible path for healthcare organizations is to treat these models as advanced research and workflow-enablement tools. They can reduce friction, surface patterns, assist documentation, and accelerate research — but they are not a substitute for professional clinical judgment, and they should not be deployed in direct patient-facing diagnostic roles without appropriate governance, local validation, and regulatory review.
Quick-Start Checklist for Healthcare Professionals
Create a free Hugging Face account at huggingface.co
Accept the MedGemma model terms (it is a gated model — one-click approval)
Open a Colab notebook, set runtime to T4 GPU, and run the official MedGemma tutorial
Test with your own de-identified sample images across different prompting tasks
Explore the GitHub notebooks for CT, histopathology, anatomical localization, and longitudinal CXR
For local deployment: follow our LM Studio guide (linked in related posts) once MedGemma 1.5 GGUF becomes available
For voice workflows: try MedASR on Hugging Face or Vertex AI
For building applications: register for the MedGemma Impact Challenge on Kaggle
Closing Thoughts
MedGemma 1.5 represents a meaningful step forward in open-source medical AI — not because it solves every problem in clinical practice, but because it extends the frontier of what is possible for developers and healthcare innovators working with real clinical data.
The addition of 3D volumetric imaging, whole-slide pathology support, anatomical localization, and longitudinal comparison capabilities, combined with the companion MedASR speech model, means that the toolset available for building the next generation of healthcare AI applications is now substantially more complete than it was six months ago.
For doctors who want to stay ahead of how AI is reshaping their field, understanding these tools — even at an exploratory, non-clinical level — is increasingly part of professional development. And for healthcare developers and innovation teams, MedGemma 1.5 is one of the strongest open-source starting points available right now.




Interesting article