Loan Default Prediction

End-to-end credit-risk pipeline - EDA, feature engineering, modelling, and head-to-head evaluation - on peer-to-peer lending data.

Why it mattersShows the full discipline of an applied ML project, not just a model fit - the parts (EDA, leakage checks, calibration, comparison) that actually decide whether a model ships.

Pythonpandasscikit-learn

GitHub

What it does

End-to-end loan-default prediction on peer-to-peer lending data. A single notebook walks through the full applied-ML cycle: exploratory analysis, data cleaning, feature engineering, model implementation, and head-to-head evaluation across multiple model families.

Where it applies

Credit-risk and underwriting work looking for a reference template with the unglamorous parts (leakage checks, calibration, model comparison) actually present.
A teaching artifact - the notebook is structured around the questions that decide whether a model ships: are the features stable, are predictions calibrated, do downstream metrics improve under each model.
Any classification problem with imbalanced labels and tabular features - the scaffolding moves over with light edits.

How it works (high level)

Starts with EDA on a peer-to-peer loans dataset to expose target leakage, missingness patterns, and categorical-cardinality issues. Cleaning and feature engineering produce a model-ready frame; multiple classical ML models are fit and compared with comprehensive evaluation rather than a single accuracy number.

Outcome

A reproducible pipeline with every step exposed for inspection, including the parts that usually get skipped: leakage diagnostics, comparison plots, and a defensible reason for picking one model over another.

Stack

Python · pandas · scikit-learn · matplotlib · Jupyter.

← Back to all projects