Project

Breast Cancer - Survival & Subtype Prediction via
RNA-Seq and Deep Learning

A machine learning pipeline using TCGA RNA-Seq and clinical data to predict 5-year survival and classify breast cancer subtypes. Powered by XGBoost and VAEs, it enables personalized oncology insights and synthetic gene expression generation for precision medicine.

Project

Cardiovascular Disease Risk Assessment AI App

A web app that predicts cardiovascular disease risk using machine learning models like Random Forest, KNN, and Logistic Regression. Powered by AI for personalized health insights using Generative AI (Gemini 1.5 Flash)

Project

Medicare Claims Analysis: Physician Billing Patterns in RI vs. NH

Analyzed over 500,000 Medicare Part B billing records across RI and NH using R; identified a 35% higher average, revealing geographic disparities in physician billing practices. The analysis focuses on consistent billing behavior, state-level differences, and specialty-based patterns.

Research Project

TMPRSS in COVID-19

Structural and functional feature analysis of transmembrane serine proteases, and Implications in viral pathogenesis

Team Project

Telemedicine App

A hospital telemedicine app for seamless doctor appointment booking, with separate portals for patients, doctors, and admins

Project

Mushroom Classification using Machine Learning

R-based machine learning project that predicts whether mushrooms are edible or poisonous using models like Random Forest, GBM, and RPART. It emphasizes classification accuracy and key feature insights

Project

Skin Lesion Segmentation App

A MATLAB-based interactive GUI application for segmenting lesions in medical images using CIELuv color space analysis and Euclidean distance-based thresholding

Project

Automated Cell Counter App

A MATLAB GUI for fast, interactive cell counting in microscopy images. Load, enhance, segment, and analyze with ease using a clean-uifigure-based interface



Project

Customer Conversion Pipeline: Bike-Share Analytics

Comprehensive analysis of Divvy bike-sharing data for 2023, exploring ride patterns, user behavior, and system usage

Project

Divvy Bike-Share: Segmenting User Behavior for Revenue Growth

A dashboard revealing how age, gender, and user type shape Chicago’s bike‑share trips, delivering clear insights for targeted marketing and strategic station expansion.

Project

HR DASHBOARD

An intuitive HR dashboard for real-time insights into employee data, performance, and workforce analytics — all in one place.

Project

Speech-to-Code
Gemini AI 2.0 Flash Lite

Speak, Generate, Code
Instantly transform your voice commands into code snippets using Google's Gemini API

Team Project

Android Flashing Toolkit

A powerful toolkit to effortlessly flash custom ROMs, recoveries, ZIPs, and Magisk on your Android device.



Research

Identification of Molecular Compounds Targeting Bacterial Propionate Metabolism with Topological Machine Learning

This research explores a novel computational method for identifying potential antimicrobial drug candidates. The study combines topological data analysis and machine learning to screen for compounds that can inhibit a key enzyme, methylcitrate dehydratase (AcnD), in bacterial propionate metabolism. The approach transforms molecular structures into topological vectors, allowing for efficient virtual screening. Promising compounds were then validated through molecular docking simulations to confirm their interaction with AcnD's active site. Fifteen compounds demonstrated favorable binding interactions, suggesting their potential to disrupt bacterial growth and virulence. This integrated strategy offers a pathway for developing new drugs against resistant bacterial infections.

Research

In-silico analysis of non-synonymous single nucleotide polymorphisms in human β-defensin type 1 gene reveals their impact on protein-ligand binding sites

This research article, published in Computational Biology and Chemistry, explores the impact of non-synonymous single nucleotide polymorphisms (nsSNPs) in the human β-defensin type 1 gene (DEFB1) on protein-ligand binding sites. Using computational methods like molecular docking and dynamics simulations, the study identifies four important nsSNPs (C67S, T58S, G62W, and Y35C) and demonstrates how they can potentially alter the DEFB1 protein's binding affinity to phosphatidylinositol 4,5-bisphosphate (PIP2), a crucial interaction for antimicrobial activity. The overall purpose of the study is to provide insights into how these genetic variations might affect the protein's function and contribute to disease susceptibility by changing the binding site of PIP2, a process related to innate immunity.

Research

The Role of Clusterin Transporter in the Pathogenesis of Alzheimer’s Disease at the Blood–Brain Barrier Interface

A systematic review focused on the role of clusterin transporter, also known as apolipoprotein J, in the development and progression of Alzheimer's disease (AD), specifically at the blood-brain barrier (BBB) interface. The review explores how clusterin interacts with amyloid beta (Aβ), a key component of AD pathology, and how this interaction affects Aβ's clearance and toxicity in the brain. It also examines the influence of clusterin on various signaling pathways like Wnt signaling and neuroinflammation and also lipid metabolism as it pertains to AD, alongside a discussion of cellular risk factors and the potential of using clusterin as a biomarker for the disease. Ultimately, the review seeks to consolidate current knowledge on clusterin's multifaceted involvement in AD pathogenesis to identify potential therapeutic targets.



Data Visualization with Tableau


Python for Research


Creating API Documentation


Programming for Everybody (Python)

Generative AI

Google Data Analytics Professional Specialization
(8 Courses)

  • Foundations: Data, Data, Everywhere

  • Ask Questions to Make

  • Data-Driven Decisions

  • Prepare Data for Exploration

  • Process Data from Dirty to Clean

  • Analyze Data to Answer Questions

  • Share Data Through the Art of Visualization

  • Data Analysis with R Programming

  • Google Data Analytics Capstone: Complete a Case Study



Master of Science

Bioinformatics & Computational Biology

(CS Oriented)

University of Texas at Dallas

Aug 2022 - May 2024

Richardson, TX

Bachelor of Science

Bioinformatics

Government College University

Sep 2017 - Oct 2021

Faisalabad, PK



Shift Supervisor

CVS Health

Aug 2024 - Present

Virginia, VA

Graduate Student Researcher

University of Texas at Dallas

Dec 2022 - May 2023

Richardson, TX

Data Analyst (Healthcare)

Shifa International Hospitals

April 2020 - Jun 2022

Faisalabad, PK

Junior Healthcare Analyst

Al-Rehmat Laboratories

Sep 2018 - Mar 2019

Faisalabad, PK


Let's Connect!



© 2025 Shan Aziz. All rights reserved.