Loan default prediction is a critical task for financial institutions, directly impacting their ability to manage risk and allocate credit properly. This paper introduces a unique, comprehensive dataset of Hungarian domestic loans, collected by the Hungarian National Bank (2015–2017), covering all retail loans issued during this period. We propose a time-aware data splitting strategy that reflects real-world banking conditions by dividing the dataset into training and test periods of varying lengths. This approach generates a substantial amount of unlabeled data due to ongoing loans without known outcomes. To address this, we develop a semi-supervised learning (SSL) framework that iteratively pseudo-labels high-confidence samples, improving both the quality and quantity of training data. Across multiple machine learning classifiers, our method demonstrates significant improvements in predictive performance, particularly in identifying defaults. These findings highlight the practical value of realistic temporal splitting and SSL techniques, offering more robust loan default prediction and better risk management strategies.
- Címlap
- Publikációk
- Semi-Supervised Learning for Loan Default Prediction Leveraging Unlabeled Data from Time-Aware Splits