OCR Evaluation Suite

OCR Evaluation Suite

Multi-engine OCR benchmark with audit trail reporting

"In OCR, it reads is not a valid test result. Accuracy is the only metric." A production-grade OCR framework built around PaddleOCR with F1 Score evaluation, Character Error Rate (CER) using Levenshtein distance, and timestamped audit trail reports. Replaces black-box OCR implementations with measurable, auditable evidence — critical for document intelligence in regulated industries.

PythonPaddleOCRTesseractOpenCVDocker
Regression Analysis

Regression Analysis

Linear regression deep-dive — Ridge, Lasso, MRM diagnostics

An end-to-end regression study on the Ames, Iowa housing dataset (2,919 sales, 80 features). Covers EDA, feature engineering, Ridge vs Lasso comparison, SHAP attributions, and Model Risk Management diagnostics aligned with SR 11-7 — VIF, Durbin-Watson (2.03), Breusch-Pagan, and Jarque-Bera. Companion notebook for a Medium article.

Pythonscikit-learnSHAPstatsmodelsJupyter
Lending Business Case Study

Lending Business Case Study

EDA on 2007–2011 lending data to identify default drivers

Exploratory data analysis of a real lending portfolio to identify the key factors behind loan defaults — amount-to-income ratio, revolving utilisation, derogatory records, and loan purpose. Uses univariate, bivariate, and trivariate analysis. Overall default rate of 14.16%, with Spain flagged as the highest-risk geography at 18.31%.

PythonPandasEDARisk AnalysisJupyter