Contents
The lending business sits at the intersection of two risks: reject a creditworthy applicant and lose revenue; approve a defaulter and absorb the loss. Good credit risk analysis is about finding the signals that separate the two.
This case study works through 2007–2011 lending data to identify the factors most predictive of loan defaults — a 14.16% base rate that varies dramatically depending on how you slice it.
The Two-Sided Risk Problem
Every rejected loan application is a lost customer. Every approved default is a direct financial loss. The goal isn’t to minimise one — it’s to find the optimal cutoff that minimises expected loss across both.
Exploratory data analysis is the first step: before building a model, understand which variables actually discriminate between defaulters and payers.
Key Risk Signals
1. Amount-to-Income Ratio
Loans where the amount-to-income ratio exceeds 25% showed significantly elevated default rates. This is your first hard filter — borrowers who are already over-extended are the highest-risk cohort regardless of credit score.
2. Revolving Line Utilisation
Borrowers with revolving utilisation above 75% combined with high-value loans represent a compounding risk. High utilisation signals that existing credit is already stretched — adding more debt rarely improves the situation.
3. Derogatory Records
Previous derogatory marks on a borrower’s record were the strongest single predictor of future default. This is consistent with the broader credit literature: past behaviour is the best predictor of future behaviour.
4. Loan Purpose
Small business loans emerged as the top-defaulting category and should be approved with caution. The volatility of small business revenue makes repayment schedules difficult to maintain — especially during the 2007-2011 period which overlapped with the financial crisis.
Methodology
The analysis used four levels of examination:
- Univariate analysis — distribution of each variable independently
- Bivariate analysis — relationship between each variable and default status
- Trivariate analysis — interaction effects between variables
- Correlation mapping — identifying multicollinearity in the feature set
Charged-off loans (defaults) showed consistently higher average loan amounts, interest rates, and debt-to-income ratios than fully-paid loans — confirming that risk and rate are priced correctly in aggregate, but individual loan-level risk assessment can still improve.
Geographic Flag
Wyoming (WY) showed unusually high default amounts requiring further investigation. Geographic concentration risk is often overlooked in retail lending — local economic shocks can cluster defaults in ways that aggregate models miss.
What This Means for Credit Risk
The findings suggest a practical risk tiering framework:
| Signal | Action |
|---|---|
| Amount/income > 25% | Decline or require collateral |
| Revolving utilisation > 75% + large loan | Decline or reduce loan amount |
| Prior derogatory record | Escalate for manual review |
| Small business purpose | Apply higher interest rate + tighter terms |
| WY geography | Flag for geographic concentration monitoring |
The 14.16% Default Rate in Context
A 14.16% default rate sounds high — but it’s an average. The high-risk cohorts described above had materially higher rates, while low-risk borrowers were well below the mean. The value of good credit risk analysis isn’t changing the average — it’s separating the distribution.
Original analysis published on Medium.