Martin Walsh.
←  Index
05Research★ Flagship

Thoracic Anatomy-AI Overlay

AI-overlaid anatomy on robotic lung-surgery video.

A research pipeline that overlays color-coded, labeled anatomy onto recorded da Vinci robotic lung-surgery video, built as a retrospective teaching tool. It pairs video object-tracking with a vision-language model acting as an automated checker, so the structures a surgeon divides during a lobectomy can be highlighted at their moment of peak visibility. The work is developed in collaboration with a thoracic surgeon and is deliberately scoped as educational, not intraoperative guidance.

PythonSAM 2Vision-language modelsPyTorchApple Silicon / MPSComputer Vision

The problem

Robotic lung-surgery video is a rich teaching resource, but the critical anatomy — pulmonary vessels, the lobar bronchus, the fissure — is hard for trainees to read frame by frame, and it is often obscured at the exact moment it matters. Stapler firings, the steps that define a lobectomy, are precisely when the target structure is covered by the instrument. There is no easy way to retrospectively annotate which structure was being divided and when, in a form a learner can study. The project addresses that gap while staying clearly outside live intraoperative guidance to avoid medical-device regulatory scope.

What I built

A model-agnostic video-labeling pipeline that ingests recorded robotic lobectomy footage and produces overlaid, labeled clips. It combines a promptable video segmentation/tracking model with a vision-language model used as an automated quality checker on the labels. A surgeon-corrections layer bakes verified anatomical rules and traced polygons back into the pipeline, so expert ground truth improves successive runs. Supporting tooling finds the highest-visibility window before each stapler firing, builds contact sheets for review, and renders the final annotated segments.

How it works

The segmentation model is seeded from a bounding box or a surgeon's trace at a hero frame and tracks the structure across the surrounding video window, while the vision-language checker verifies and flags labels. A key design insight drove the current approach: rather than annotating the firing frame itself — where the stapler hides the anatomy — the pipeline targets the loop-wrapped moment just before firing, when the vessel is most isolated and visible. Where automated detection is unreliable (for example, white vessel loops that are visually inseparable from surrounding tissue), the system falls back to surgeon-provided traces rather than guessing. The full stack runs locally on Apple Silicon, with model choice kept swappable.

Where it stands

A proof-of-concept on a left-upper-lobe case is working: stapler-firing events are snapshotted, the firing sequence is mapped against the operative record, and a clean win was produced for the superior pulmonary vein — a tight, stable, expert-reviewed overlay at the pre-firing hero frame. The work surfaced concrete, documented findings about which structures yield to automated tracking and which require traced ground truth, shaping the next architecture. Surgeon ground-truth labeling is the next step toward broader validation. The effort is an active research collaboration framed strictly as a retrospective, educational tool.