- Project Overview
- Project Description
- Key Features
- Tools & Technologies
- Project Folder Structure
- Installation & Setup (One Block for Python + Power BI)
- How to Run (For both Python EDA + Power BI)
- Detailed Overview of HealthCare_EDA in Python
8.1 Description of the Dataset
8.2 Data Cleaning & Preparation
8.2.1 Merging All Datasets
8.2.2 Standardizing Data
8.2.3 Data Integrity Validation
8.2.4 Handling Missing Values
8.2.5 Handling Duplicates Records
8.2.6 Converting Datatypes
8.2.7 Creating Derived Columns
8.2.8 Mapping Categorical Values - Exploratory Data Analysis (EDA)
9.1 Univariate Analysis
9.2 Bivariate Analysis
9.3 Multivariate Analysis
9.4 Distribution Analysis
9.5 Correlation Analysis - Detailed Overview of HealthCare Power BI Dashboard
10.1 Overview Dashboard
10.2 Medical Condition & Outcome Analysis
10.3 Billing & Insurance Analysis
10.4 Doctor & Hospital Performance
10.5 Time-Based Analysis - Author
- License
This project focuses on analyzing healthcare data to uncover key insights into patient admissions, medical conditions, treatment outcomes, and hospital performance. By combining Python for data preparation and cleaning with Power BI for interactive dashboards, the project aims to support healthcare administrators in making data-driven operational and clinical decisions.
The Healthcare Data Analysis and Visualization Project involves working with a multi-sheet Excel dataset containing patient details, hospital information, doctor records, and patient visit data. The project workflow starts with merging and cleaning the data using Python libraries such as Pandas and NumPy in a Jupyter Notebook environment. Key data cleaning steps included handling missing values, standardizing text data, mapping admission type codes, calculating patient length of stay, and identifying high billing cases.
After preparing a clean and integrated dataset, exploratory data analysis (EDA) was performed in Python to validate data distributions and detect anomalies. The prepared dataset was then visualized in Power BI, where a series of interactive dashboards were built to deliver actionable insights.
The dashboards created include:
🔍 Overview Dashboard: Patient Admissions Summary: Visualizing patient admission counts, age distribution, gender splits, and admission trends.
🏥 Medical Condition & Outcome Analysis: Analyzing the frequency of medical conditions, treatment outcomes, and recovery rates.
💵 Billing & Insurance Analysis: Tracking billing amounts, insurance coverage patterns, and flagging high-cost cases.
🧑⚕ Doctor & Hospital Performance: Evaluating doctor-wise and hospital-wise patient outcomes, admissions, and billing performance.
📅 Time-Based Analysis: Examining trends over time including admissions, discharges, and length of stay patterns.
This project demonstrates how Python-based data engineering can seamlessly integrate with BI tools like Power BI to deliver healthcare insights that improve operational efficiency and patient care decisions.
-
📑 Merges multiple Excel sheets into a single clean dataset.
-
🧹 Cleans and standardizes patient, doctor, and hospital details.
-
⚙ Handles missing values (numeric → median, categorical → mode).
-
📏 Calculates Length of Stay for each patient.
-
💸 Flags patients with High Billing Amounts.
-
🔢 Maps Admission Types to numeric codes for analysis.
-
📊 Performs EDA using Python (Pandas, Matplotlib, Seaborn).
-
📈 Builds Power BI dashboards for dynamic visual insights.
-
Python
-
Pandas
-
NumPy
-
Matplotlib
-
Seaborn
-
-
Power BI
-
Microsoft Excel
-
Jupyter Notebook
-
CSV & Excel Files (for data storage)
├── 📁 data/ # Healthcare Excel dataset files
│ ├── healthcare_dataset.xlsx
│
├── 📁 Images/ # Project images for README or dashboards
│
├── 📁 python/ # Python notebook, requirements, and scripts
│ ├── HEALTHCARE_EDA.ipynb
│ ├── requirements.txt
│
├── 📁 PowerBI/ # Power BI dashboard files
│ ├── HEALTHCARE_DASHBOARD.pbix
│
├── 📄 .gitignore # Git ignore rules
├── 📄 LICENSE # Project open source license
├── 📄 README.md # Project overview and documentation
git clone https://github.com/jasminshaik15/Visual-Healthcare-Insights-Python-EDA-Power-BI-Dashboards.git
cd Visual-Healthcare-Insights-Python-EDA-Power-BI-Dashboards
pip install -r Python/requirements.txt
jupyter notebook Python/HEALTHCARE_EDA.ipynb
Make sure you have all the necessary dependencies by running the following command:
pip install -r Python/requirements.txt
After installing the dependencies, open the Jupyter notebook with the following command: jupyter notebook Python/HEALTHCARE_EDA.ipynb
Once the notebook is open in your browser, execute all the cells to run the EDA analysis.
If you haven't already, install Power BI Desktop. You can download it from here.
To view the dashboards, open the Power BI file located in the PowerBI directory:
PowerBI/HealthCare_Dashboard.pbix
Once the Power BI file is open, you can explore the following interactive dashboards:
-
📊 Overview Dashboard
-
🩺 Medical Condition & Outcome Analysis
-
💸 Billing & Insurance Analysis
-
🧑⚕ Doctor & Hospital Performance
If you need to refresh the data, connect to the Excel file located under the /data/ directory.
This notebook begins with a descriptive exploration of the patient and hospital datasets using summary statistics and visual analysis. It then examines patterns in patient demographics, admission types, and medical conditions to understand what factors may influence hospital stay duration. Finally, relationships between variables such as department, billing, and severity of illness are analyzed further.
The data in the healthcare dataset includes information about patients admitted to hospitals across different medical conditions. It contains 55500 rows and 17 columns, with data spanning several years, starting from 2019. The dataset includes details such as patient ID (Pid), doctor ID (Did), hospital ID (Hid), medical condition, date of admission, insurance provider, billing amount, room number, admission type, discharge date, medication prescribed, test results, patient name, age, gender, blood type, doctor name, and hospital name.
Key variables in the dataset include medical condition (Cancer, Diabetes, Asthma, Hypertension), billing amount (non-negative real numbers), room number (integer), admission type (Elective, Emergency, Urgent), and medication (Lipitor, Aspirin, Paracetamol). The age and blood type variables are numerical, while gender and insurance provider are categorical variables. The test results vary, with categories like Inconclusive, Abnormal, Normal and NaN values.
Data Cleaning & Preparation is the process of identifying and fixing errors, inconsistencies, and missing values in raw data, transforming it into a structured, reliable, and analysis-ready format for further processing.
To perform a complete analysis, we merge all four datasets using their respective key columns (Pid, Hid, Did). This helps consolidate patient_details, hospital_details, doctor_details, and patients_data into a single unified DataFrame for further exploration and visualization.
After merging all datasets, we ensure the Name, Doctor, and Hospital columns are clean and consistently formatted. This helps eliminate redundancy, avoids mismatched values, and improves overall data quality for analysis and visualization.
Identifying Mismatches and Foreign Key Issues Between Pid, Did, and Hid in Merged Data and Master Tables.
Identify and appropriately handle missing values in the dataset to prevent incomplete analysis or errors during visualization.
Identify and appropriately handle missing values in the dataset to prevent incomplete analysis or errors during visualization.
Ensure columns like dates, amounts, ids & age have correct data types for analysis.
Creating useful new columns like Length of Stay & Billing Category
mapping or encode categorical values for better readability.
Creating charts and graphs to make sense of data patterns, trends, relationships, and anomalies visually.
Univariate Analysis is the simplest form of data analysis where only one variable is analyzed at a time to understand its distribution, central tendency, spread, and underlying patterns.
Bivariate Analysis is the analysis of two variables simultaneously to explore the relationship, association, or correlation between them and understand how one variable affects or relates to the other.
Multivariate Analysis is the analysis of more than two variables simultaneously to understand complex relationships, interactions, and combined effects among multiple variables within a dataset.
Understand data distribution patterns and proportions.
Correlation Heatmap: Show correlation strength between multiple numeric variables
This comprehensive Power BI Healthcare Admissions & Billing Dashboard offers end-to-end insights into patient admissions, medical conditions, doctor performance, billing trends, and time-based activity. It includes interactive KPI cards, dynamic charts, matrix visuals, and drill-through pages for detailed patient-level analysis. The dashboard empowers stakeholders to monitor hospital operations, financial performance, and clinical outcomes effectively with slicers, bookmarks, and customized timelines for rich, interactive exploration.
What it does:
This dashboard provides a quick summary of hospital admissions, patient volumes, and financial performance.
📊 Visual Insights:
-
KPI cards display total admissions, average stay, total billing, and avg. billing per patient.
-
Bar chart shows admissions trend by year/month.
-
Donut chart compares Elective vs Emergency admissions.
-
Slicers allow filtering by Year, Gender, and Insurance Provider.
🎯 Result:
Quickly monitor hospital activity, identify admission trends, and understand patient distribution by type and demographics at a glance.
What it does:
This dashboard highlights patient counts by medical condition and their corresponding test outcomes.
📊 Visual Insights:
-
Stacked bar chart shows the Top 10 medical conditions by number of patients.
-
Matrix displays the outcome distribution (Normal, Abnormal, Inconclusive) for each condition.
-
Table lists patient details, filterable by condition and doctor using slicers.
🎯 Result: Quickly identify which conditions are most common, how patients are performing in tests, and filter detailed patient lists for deeper analysis.
What it does:
This dashboard tracks hospital billing patterns, insurance provider contributions, and cost relationships.
📊 Visual Insights:
Bar chart compares total billing amounts by insurance provider.
- Line chart shows billing trends over time.
= Scatter chart visualizes how billing amounts relate to patient length of stay, color-coded by medical condition.
🎯 Result: Easily monitor financial performance, identify top-paying insurers, and spot patterns between costs, patient stays, and conditions.
What it does:
This dashboard evaluates doctor workload, patient outcomes, and hospital-wise admissions.
📊 Visual Insights:
-
Table/Matrix shows each doctor’s patient count, average billing, and average length of stay.
-
Bar chart displays number of admissions per hospital.
-
Heat map cross-tabulates doctors with admission types and test results.
🎯 Result: Identify high-performing doctors, hospital patient loads, and how test results vary by doctor and admission type.
What it does:
This dashboard tracks patient admissions over time, helping spot trends and seasonal patterns.
📊 Visual Insights:
-
Line chart shows admission trends over time.
-
Calendar heatmap highlights daily admissions activity.
-
Custom timeline (via bookmarks) lets users switch views by Year → Quarter → Month → Date.
-
Drill-through pages provide patient-level details from any time point.
🎯 Result: Understand how admissions fluctuate over time, identify peak periods, and drill down to patient records on specific dates for deeper analysis.
Jasmin Shaik
Data Scientist | Data Science Enthusiast
- 📧 Email: [email protected]
- 🌐 LinkedIn: https://linkedin.com/in/yourprofile
- 🌐 Portfolio: https://yourportfolio.com
This project is licensed under the MIT License.





















