Problem
An agri-tech lender needed to extend credit to SMB farmers and food producers without traditional credit-bureau coverage. The available signal was raw invoice data — purchase orders, payment histories, supplier relationships, seasonality patterns — sitting in unstructured PDFs and accounting exports across the borrower's existing book.
Approach
Built an end-to-end pipeline from invoice ingestion through credit scoring. Invoice parsing and entity extraction to normalise counterparties, payment-terms inference, cash-flow seasonality modelling, and a credit-scoring layer that combined invoice-derived features with available structured data. Special attention to the "thin file" problem — borrowers with months rather than years of history.
Stack
Python ML pipeline · invoice parsing / OCR layer · scikit-learn / XGBoost · feature store · scoring API
Outcome
A credit-decisioning layer the lender can run on borrowers traditional bureaus do not cover. The model produces both a score and a per-feature attribution (which invoice-derived signals drove the decision), so credit officers can audit and override individual cases — required for regulated lending review.