Loan Default Prediction
End-to-end credit-risk pipeline - EDA, feature engineering, modelling, and head-to-head evaluation - on peer-to-peer lending data.
Why it mattersShows the full discipline of an applied ML project, not just a model fit - the parts (EDA, leakage checks, calibration, comparison) that actually decide whether a model ships.
What it does
End-to-end loan-default prediction on peer-to-peer lending data. A single notebook walks through the full applied-ML cycle: exploratory analysis, data cleaning, feature engineering, model implementation, and head-to-head evaluation across multiple model families.
Where it applies
- Credit-risk and underwriting work looking for a reference template with the unglamorous parts (leakage checks, calibration, model comparison) actually present.
- A teaching artifact - the notebook is structured around the questions that decide whether a model ships: are the features stable, are predictions calibrated, do downstream metrics improve under each model.
- Any classification problem with imbalanced labels and tabular features - the scaffolding moves over with light edits.
How it works (high level)
Starts with EDA on a peer-to-peer loans dataset to expose target leakage, missingness patterns, and categorical-cardinality issues. Cleaning and feature engineering produce a model-ready frame; multiple classical ML models are fit and compared with comprehensive evaluation rather than a single accuracy number.
Outcome
A reproducible pipeline with every step exposed for inspection, including the parts that usually get skipped: leakage diagnostics, comparison plots, and a defensible reason for picking one model over another.
Stack
Python · pandas · scikit-learn · matplotlib · Jupyter.