
Healthcare Data Scientist / Data Analyst
Project

A machine learning pipeline using TCGA RNA-Seq and clinical data to predict 5-year survival and classify breast cancer subtypes. Powered by XGBoost and VAEs, it enables personalized oncology insights and synthetic gene expression generation for precision medicine.
Project

A web app that predicts cardiovascular disease risk using machine learning models like Random Forest, KNN, and Logistic Regression. Powered by AI for personalized health insights using Generative AI (Gemini 1.5 Flash)
Project

Analyzed over 500,000 Medicare Part B billing records across RI and NH using R; identified a 35% higher average, revealing geographic disparities in physician billing practices. The analysis focuses on consistent billing behavior, state-level differences, and specialty-based patterns.
Research Project

Structural and functional feature analysis of transmembrane serine proteases, and Implications in viral pathogenesis
Team Project

A hospital telemedicine app for seamless doctor appointment booking, with separate portals for patients, doctors, and admins
Project

R-based machine learning project that predicts whether mushrooms are edible or poisonous using models like Random Forest, GBM, and RPART. It emphasizes classification accuracy and key feature insights
Project

A MATLAB-based interactive GUI application for segmenting lesions in medical images using CIELuv color space analysis and Euclidean distance-based thresholding
Project

A MATLAB GUI for fast, interactive cell counting in microscopy images. Load, enhance, segment, and analyze with ease using a clean-uifigure-based interface
Project

Comprehensive analysis of Divvy bike-sharing data for 2023, exploring ride patterns, user behavior, and system usage
Project

A dashboard revealing how age, gender, and user type shape Chicago’s bike‑share trips, delivering clear insights for targeted marketing and strategic station expansion.
Project

An intuitive HR dashboard for real-time insights into employee data, performance, and workforce analytics — all in one place.
Project

Speak, Generate, Code
Instantly transform your voice commands into code snippets using Google's Gemini API
Team Project

A powerful toolkit to effortlessly flash custom ROMs, recoveries, ZIPs, and Magisk on your Android device.
Research
This research explores a novel computational method for identifying potential antimicrobial drug candidates. The study combines topological data analysis and machine learning to screen for compounds that can inhibit a key enzyme, methylcitrate dehydratase (AcnD), in bacterial propionate metabolism. The approach transforms molecular structures into topological vectors, allowing for efficient virtual screening. Promising compounds were then validated through molecular docking simulations to confirm their interaction with AcnD's active site. Fifteen compounds demonstrated favorable binding interactions, suggesting their potential to disrupt bacterial growth and virulence. This integrated strategy offers a pathway for developing new drugs against resistant bacterial infections.
Research
This research article, published in Computational Biology and Chemistry, explores the impact of non-synonymous single nucleotide polymorphisms (nsSNPs) in the human β-defensin type 1 gene (DEFB1) on protein-ligand binding sites. Using computational methods like molecular docking and dynamics simulations, the study identifies four important nsSNPs (C67S, T58S, G62W, and Y35C) and demonstrates how they can potentially alter the DEFB1 protein's binding affinity to phosphatidylinositol 4,5-bisphosphate (PIP2), a crucial interaction for antimicrobial activity. The overall purpose of the study is to provide insights into how these genetic variations might affect the protein's function and contribute to disease susceptibility by changing the binding site of PIP2, a process related to innate immunity.
Research
A systematic review focused on the role of clusterin transporter, also known as apolipoprotein J, in the development and progression of Alzheimer's disease (AD), specifically at the blood-brain barrier (BBB) interface. The review explores how clusterin interacts with amyloid beta (Aβ), a key component of AD pathology, and how this interaction affects Aβ's clearance and toxicity in the brain. It also examines the influence of clusterin on various signaling pathways like Wnt signaling and neuroinflammation and also lipid metabolism as it pertains to AD, alongside a discussion of cellular risk factors and the potential of using clusterin as a biomarker for the disease. Ultimately, the review seeks to consolidate current knowledge on clusterin's multifaceted involvement in AD pathogenesis to identify potential therapeutic targets.

Tableau

Harvard University

LinkedIn Learning

University of Michigan

IBM

Foundations: Data, Data, Everywhere
Ask Questions to Make
Data-Driven Decisions
Prepare Data for Exploration
Process Data from Dirty to Clean
Analyze Data to Answer Questions
Share Data Through the Art of Visualization
Data Analysis with R Programming
Google Data Analytics Capstone: Complete a Case Study

University of Texas at Dallas
Richardson, TX

Government College University
Faisalabad, PK

CVS Health
Virginia, VA

University of Texas at Dallas
Richardson, TX

Shifa International Hospitals
Faisalabad, PK

Al-Rehmat Laboratories
Faisalabad, PK
© 2025 Shan Aziz. All rights reserved.