CUDA-Accelerated-Chest-X-Ray-Classification-with-ONNX!

About the Project

In this project, I built a high-performance pipeline for chest X-ray analysis. The goal was to speed up image preprocessing and model inference using GPU acceleration and ONNX, making it efficient enough to handle large datasets quickly.

Basically, I wanted to see how we can take medical images, prepare them efficiently, and run predictions fast — all while keeping things reproducible and clear.

What I Did

GPU-Powered Preprocessing
- Converted chest X-ray images to grayscale using CUDA/Numba, running directly on the GPU.
- Normalized pixel values so the images are ready for the model.
- Saved these preprocessed images as .npy files for easy reuse.
ONNX Model Inference
- Took the preprocessed images, replicated channels to match DenseNet-121’s expected input, and resized them to 224×224.
- Ran predictions using ONNXRuntime, which is faster than standard PyTorch inference.
- Recorded the predicted class, confidence, and inference time for each image.
Performance Comparison
- Measured inference times for both PyTorch and ONNX.
- Created bar charts to clearly show the speed improvement with ONNX.

Project Structure

project-root/
│
├─ dataset/
│   └─ chest_xray/      # Original X-ray images
│
├─ preprocessed/
│   └─ *.npy            # GPU-preprocessed images
│
├─ notebooks/
│   └─ preprocessing_and_onnx_inference.ipynb
│
├─ models/
│   └─ densenet121.onnx
│
└─ README.md

How to Use

Preprocessing Images

gray_img = preprocess_image_gpu(img)
np.save("preprocessed/NORMAL/IM-0001.npy", batch_img)

Running ONNX Inference

import onnxruntime as ort
ort_session = ort.InferenceSession("models/densenet121.onnx")
outputs = ort_session.run(None, {ort_session.get_inputs()[0].name: batch_resized})

Measuring Inference Time

start = time.time()
outputs = ort_session.run(None, {ort_session.get_inputs()[0].name: batch_resized})
end = time.time()
inference_time_ms = (end - start) * 1000

Tools and Libraries I Used

NumPy – For array handling and batching.
OpenCV – To resize and manipulate images.
Numba – For CUDA-powered preprocessing.
ONNXRuntime – For fast GPU inference.
PyTorch – DenseNet-121 baseline comparison.
Matplotlib & Pandas – For visualizations and tables.

Results

Preprocessing on GPU was fast and efficient.
ONNX inference ran significantly faster than PyTorch, helping speed up predictions.
The pipeline is reproducible and works well with larger datasets.

Next Steps

Implement TensorRT acceleration for even faster inference.
Add batch processing to handle multiple images at once on HPC clusters.
Explore fine-tuning DenseNet on pediatric X-ray datasets for better medical accuracy.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Project.ipynb		Project.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CUDA-Accelerated-Chest-X-Ray-Classification-with-ONNX!

About the Project

What I Did

Project Structure

How to Use

Tools and Libraries I Used

Results

Next Steps

About

Uh oh!

Releases

Packages

Languages

jayshah1819/CUDA-Accelerated-Chest-X-Ray-Classification-with-ONNX

Folders and files

Latest commit

History

Repository files navigation

CUDA-Accelerated-Chest-X-Ray-Classification-with-ONNX!

About the Project

What I Did

Project Structure

How to Use

Tools and Libraries I Used

Results

Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages