In this project, I built a high-performance pipeline for chest X-ray analysis. The goal was to speed up image preprocessing and model inference using GPU acceleration and ONNX, making it efficient enough to handle large datasets quickly.
Basically, I wanted to see how we can take medical images, prepare them efficiently, and run predictions fast — all while keeping things reproducible and clear.
-
GPU-Powered Preprocessing
- Converted chest X-ray images to grayscale using CUDA/Numba, running directly on the GPU.
- Normalized pixel values so the images are ready for the model.
- Saved these preprocessed images as
.npyfiles for easy reuse.
-
ONNX Model Inference
- Took the preprocessed images, replicated channels to match DenseNet-121’s expected input, and resized them to 224×224.
- Ran predictions using ONNXRuntime, which is faster than standard PyTorch inference.
- Recorded the predicted class, confidence, and inference time for each image.
-
Performance Comparison
- Measured inference times for both PyTorch and ONNX.
- Created bar charts to clearly show the speed improvement with ONNX.
project-root/
│
├─ dataset/
│ └─ chest_xray/ # Original X-ray images
│
├─ preprocessed/
│ └─ *.npy # GPU-preprocessed images
│
├─ notebooks/
│ └─ preprocessing_and_onnx_inference.ipynb
│
├─ models/
│ └─ densenet121.onnx
│
└─ README.md
- Preprocessing Images
gray_img = preprocess_image_gpu(img)
np.save("preprocessed/NORMAL/IM-0001.npy", batch_img)- Running ONNX Inference
import onnxruntime as ort
ort_session = ort.InferenceSession("models/densenet121.onnx")
outputs = ort_session.run(None, {ort_session.get_inputs()[0].name: batch_resized})- Measuring Inference Time
start = time.time()
outputs = ort_session.run(None, {ort_session.get_inputs()[0].name: batch_resized})
end = time.time()
inference_time_ms = (end - start) * 1000- NumPy – For array handling and batching.
- OpenCV – To resize and manipulate images.
- Numba – For CUDA-powered preprocessing.
- ONNXRuntime – For fast GPU inference.
- PyTorch – DenseNet-121 baseline comparison.
- Matplotlib & Pandas – For visualizations and tables.
- Preprocessing on GPU was fast and efficient.
- ONNX inference ran significantly faster than PyTorch, helping speed up predictions.
- The pipeline is reproducible and works well with larger datasets.
- Implement TensorRT acceleration for even faster inference.
- Add batch processing to handle multiple images at once on HPC clusters.
- Explore fine-tuning DenseNet on pediatric X-ray datasets for better medical accuracy.