top of page

MedGemma 1.5 and MedASR for Doctors: What Healthcare Professionals Need to Know

An AI-assisted radiology workflow — doctor reviewing CT scan on screen with AI overlay highlights.

Healthcare AI is no longer a future concept — it is actively reshaping how clinicians work with imaging studies, patient records, and clinical documentation. Google's latest release, MedGemma 1.5, is one of the most significant open-source medical AI updates in recent months, and it comes with a companion model, MedASR, purpose-built for medical speech recognition.

If you read our earlier blog on running the original MedGemma locally via LM Studio, this post is the natural follow-up. Here we break down exactly what is new, what the benchmarks mean for clinical workflows, and how doctors and healthcare developers can get started with the updated models.


Why Specialized Medical AI Matters More Than General-Purpose Models

General-purpose AI can handle a broad range of questions. But medicine does not work with general-purpose data. Clinicians work with DICOM images, multi-slice CT volumes, whole-slide pathology scans, serial chest X-rays taken months apart, and dictated notes full of specialized terminology that a generic speech model will inevitably mishandle.

This is precisely the gap that MedGemma 1.5 and MedASR are designed to fill. Rather than asking a general AI to approximate medical reasoning, these models are trained on de-identified medical data including chest X-rays, dermatology images, histopathology slides, CT scans, MRI volumes, and electronic health records. The healthcare industry is already adopting AI at twice the rate of the broader economy — and the demand for tools that work natively with clinical data is accelerating.

 

What Is New in MedGemma 1.5? A Full Capability Breakdown

MedGemma 1.5 is not a minor incremental update. It introduces fundamentally new imaging modalities, significant accuracy improvements across the board, and a new structured document extraction capability. Here is what changed:

 

1. 3D Volumetric Imaging: CT Scans and MRI

This is the headline upgrade. The original MedGemma worked with single 2D images. MedGemma 1.5 can now accept multiple slices — the way CT and MRI data actually exists in clinical practice — and reason across the volume as a whole.

In internal benchmarks, absolute accuracy on CT classification improved from 58% to 61% (+3%), while MRI classification saw a much larger jump from 51% to 65% (+14%). For a 4-billion-parameter model running locally, these figures are remarkable — especially when compared to proprietary cloud-based models with over a trillion parameters.

Developers can explore CT and histopathology workflows using official tutorial notebooks available on the MedGemma GitHub repository.

Screenshot of MedGemma 1.5 Github Repository

 

2. Whole-Slide Histopathology — From Patches to Full Slides

MedGemma 1 could analyze small pathology patches. Version 1.5 now supports full whole-slide images, the kind of complete pathology scans that pathologists actually review.

The ROUGE-L score for histopathology — a measure of how accurately the model describes slide findings — jumped from 0.02 to 0.49, practically matching the 0.498 score of PolyPath, a model built specifically for this task. That improvement from near-zero to competitive performance in a single version update is substantial.

 

3. Anatomical Localization in Chest X-Rays

MedGemma 1.5 can now identify and locate specific anatomical structures within a chest X-ray — pinpointing where findings are, not just what they might be. This is directly relevant to radiology reporting and AI-assisted triaging workflows.

Benchmark performance on the Chest ImaGenome dataset improved from 3% to 38% intersection over union — a 35 percentage point jump that brings this capability from essentially non-functional to genuinely useful.

Screenshot from the anatomical localization tutorial notebook (available on GitHub) showing a chest X-ray with the model's response identifying heart, lungs, and specific abnormalities with location descriptions.

 

4. Longitudinal Chest X-Ray Comparison

One of the most clinically valuable capabilities in the updated model is time-series imaging review — comparing a patient's current chest X-ray against one taken months or years earlier to assess disease progression, treatment response, or new findings.

Accuracy on the MS-CXR-T longitudinal benchmark improved from 61% to 66%. In a live demonstration, the model successfully identified a pulmonary nodule that had grown between two X-rays taken six months apart, recommending further investigation with CT and potential biopsy.

Screenshot showing two chest X-rays from different time points side-by-side in the notebook interface, with the model's response identifying interval change

 

5. Structured Lab Report Data Extraction

Beyond imaging, MedGemma 1.5 also improves its ability to extract structured information from clinical documents such as lab reports — pulling out test names, values, and units from both typed and handwritten reports.

The retrieval macro F1 score on lab report extraction improved from 60% to 78% (+18%). For healthcare data teams, administrative staff, and hospital IT departments, this is one of the most operationally relevant improvements in the update.


Screenshot showing a handwritten or semi-structured lab report as input and the model's structured output table (test name / value / unit) — taken from the hands-on demo referenced in the research transcript.

 

6. Improved Medical Text and EHR Reasoning

Even on pure text tasks, MedGemma 1.5 outperforms its predecessor. MedQA accuracy improved from 64% to 69% — comparable to what you would expect from a medical licensing exam preparation benchmark. More significantly, performance on EHR question-answering (EHRQA) jumped from 68% to 90% (+22%), which has direct implications for clinical documentation retrieval and secondary use of health records.

 

MedGemma 1 vs MedGemma 1.5 — Performance at a Glance

Feature / Benchmark

MedGemma 1 (4B)

MedGemma 1.5 (4B)

CT Image Accuracy

58%

61% (+3%)

MRI Image Accuracy

51%

65% (+14%)

Histopathology (ROUGE-L)

0.02

0.49 (+0.47)

Anatomical Localization (IoU)

3%

38% (+35%)

Longitudinal CXR Accuracy

61%

66% (+5%)

Lab Report Extraction (F1)

60%

78% (+18%)

MedQA (Text Reasoning)

64%

69% (+5%)

EHR Q&A (EHRQA)

68%

90% (+22%)

3D Volumetric (CT/MRI) Support

❌ Not available

✅ Supported

Whole-Slide Histopathology

Patch-level only

✅ Full slides

Medical Speech (MedASR)

❌ Not available

✅ New model

DICOM Support (Cloud)

Limited

✅ Full support

 

Introducing MedASR: Medical Speech-to-Text Built for Clinical Vocabulary

Running alongside MedGemma 1.5, Google has also released MedASR — an automated speech recognition model fine-tuned specifically for medical dictation. This is a separate model, but it is designed to pair naturally with MedGemma for end-to-end voice-driven clinical workflows.

Screenshot of the WER comparison chart — showing MedASR's 5.2% word error rate vs Whisper's 12.5% on CXR dictation, and 5.2% vs 28.2% on diverse medical dictation.

 

Why General ASR Falls Short in Healthcare

Generic speech recognition tools like OpenAI's Whisper are powerful for everyday language, but medical dictation is a different domain. Medication names, anatomical terms, procedural terminology, and specialty-specific vocabulary create a high error surface for models not trained on clinical speech.

How Much Better is MedASR?

The numbers from head-to-head benchmarking are striking:

  • Chest X-ray dictation word error rate: 5.2% (MedASR) vs 12.5% (Whisper) — 58% fewer errors

  • Diverse multi-specialty dictation: 5.2% (MedASR) vs 28.2% (Whisper) — 82% fewer errors

 

That 82% reduction in errors on multi-specialty dictation is the figure that matters most for clinical use. Fewer errors in transcribed notes means fewer downstream corrections, less administrative burden, and more reliable data for decision-making.

Two Ways to Use MedASR

  • As a transcription tool: Convert medical dictation into accurate clinical notes. Especially useful for radiologists, pathologists, and any specialty with high dictation volume.

  • As a voice interface for MedGemma: Speak your clinical query, MedASR transcribes it, MedGemma reasons over the result. This creates a more natural hands-free workflow for busy clinicians.

A simple flow diagram showing: Doctor speaks → MedASR transcribes → MedGemma reasons → Output to clinician.

 

How to Access and Run MedGemma 1.5

There are three primary ways to work with MedGemma 1.5, depending on your technical environment and data privacy requirements:

 

Option 1: Google Colab (Easiest Starting Point)

For doctors, researchers, and healthcare AI enthusiasts who want to experiment without local setup, Google Colab is the fastest route.

  • Go to colab.research.google.com and create a new notebook

  • Change runtime type to T4 GPU (Runtime > Change runtime type)

  • Install dependencies: pip install torch torchvision transformers

  • Authenticate with your Hugging Face token (free account required — MedGemma is a gated model)

  • Load the model using the transformers pipeline function

  • Pass medical images or text as prompts to begin testing

 

The model download requires approximately 8–10 GB and may take a few minutes on first run. Tutorial notebooks for all major use cases (CT, histopathology, anatomical localization, longitudinal CXR, lab extraction) are available on the MedGemma GitHub repository.


Screenshot of the Colab runtime settings panel showing T4 GPU selected. This is a practical visual that helps non-technical readers understand the setup step

 


Option 2: Local Deployment via LM Studio (Best for Data Privacy)

Download medgemma-1.5-4b-it-GGUF by clicking on Search models button in side menu of LM Studio.

For healthcare professionals who need patient data to remain entirely on their own machine — the most important consideration for clinical environments — LM Studio provides a no-code interface for running MedGemma locally.

We covered this setup in detail in our original MedGemma blog. The same process applies for version 1.5 once GGUF-format versions become available in the LM Studio community on Hugging Face. Key hardware requirements are summarized in the table below.

 

Model Variant

CPU Cores

RAM

GPU VRAM

Storage

MedGemma 1.5 4B (Multimodal)

4+

8–16 GB

6 GB VRAM+

~10 GB

MedGemma 1 27B (Text-Only)

8+

32–48 GB

12 GB VRAM+

~28 GB

MedGemma 1 27B (Multimodal)

8+

48–64 GB

24 GB VRAM+

~35 GB

Google Colab (Free)

12 GB (T4)

T4 (16 GB)

Cloud-based

 

Option 3: Google Vertex AI / Model Garden (For Enterprise & Teams)

For hospital IT teams, health-tech startups, and developers building scalable applications, Google Vertex AI now includes full DICOM support for MedGemma — making it significantly easier to integrate into existing radiology and clinical imaging pipelines. MedASR is also available on Vertex AI alongside Hugging Face.

 

MedGemma 1.5 Already in Use: Real-World Healthcare Applications

These models are not just theoretical. Developers and healthcare organizations are already building on them:

 

  • Qmed Asia (Malaysia): Adapted MedGemma to build askCPG, a conversational interface to Malaysia's 150+ clinical practice guidelines. The Ministry of Health Malaysia noted that the tool has made navigating clinical decision support more practical in day-to-day use.

  • Taiwan National Health Insurance Administration: Applied MedGemma to extract structured data from over 30,000 pathology reports to inform surgical policy for lung cancer resection decisions.

  • Academic Research: MedGemma has been cited in published research comparing it favorably to other base models for medical text understanding, multidisciplinary team decision making, and mammography reporting.


Google has also announced the MedGemma Impact Challenge — a Kaggle-hosted hackathon with $100,000 in prizes, open to all developers who want to build the next generation of healthcare AI tools on top of MedGemma and the Health AI Developer Foundations ecosystem.

 

What Doctors Must Know: Clinical Caution Is Non-Negotiable

Disclaimer: MedGemma 1.5 and MedASR are foundational tools for researchers and developers. They are not cleared for direct clinical use and must not replace professional clinical judgment. All outputs require independent verification, clinical correlation, and rigorous validation before any operational deployment.

That framing is important and consistent with every serious discussion of these tools. A compelling benchmark is not the same as validated clinical performance across real-world populations, devices, imaging equipment, and care settings.

The responsible path for healthcare organizations is to treat these models as advanced research and workflow-enablement tools. They can reduce friction, surface patterns, assist documentation, and accelerate research — but they are not a substitute for professional clinical judgment, and they should not be deployed in direct patient-facing diagnostic roles without appropriate governance, local validation, and regulatory review.

 

Quick-Start Checklist for Healthcare Professionals

  • Create a free Hugging Face account at huggingface.co

  • Accept the MedGemma model terms (it is a gated model — one-click approval)

  • Open a Colab notebook, set runtime to T4 GPU, and run the official MedGemma tutorial

  • Test with your own de-identified sample images across different prompting tasks

  • Explore the GitHub notebooks for CT, histopathology, anatomical localization, and longitudinal CXR

  • For local deployment: follow our LM Studio guide (linked in related posts) once MedGemma 1.5 GGUF becomes available

  • For voice workflows: try MedASR on Hugging Face or Vertex AI

  • For building applications: register for the MedGemma Impact Challenge on Kaggle

 

Closing Thoughts

MedGemma 1.5 represents a meaningful step forward in open-source medical AI — not because it solves every problem in clinical practice, but because it extends the frontier of what is possible for developers and healthcare innovators working with real clinical data.


The addition of 3D volumetric imaging, whole-slide pathology support, anatomical localization, and longitudinal comparison capabilities, combined with the companion MedASR speech model, means that the toolset available for building the next generation of healthcare AI applications is now substantially more complete than it was six months ago.


For doctors who want to stay ahead of how AI is reshaping their field, understanding these tools — even at an exploratory, non-clinical level — is increasingly part of professional development. And for healthcare developers and innovation teams, MedGemma 1.5 is one of the strongest open-source starting points available right now.

 

 
 
 

1 Comment


Interesting article

Like

Copyright 2025 Averox Global Solutions

bottom of page