Introduction

Lending Club (LC) is an American company listed on the New York stock exchange that provides a platform for peer-to-peer lending. Unlike banks, it does not take deposits and invest them. It is purely a matching system. Each loan is split into $25 that multiple investors can invest in. LC is remunerated by fees received from both sides. LC state they have intermediated more than $50bln since they started operations. Further description of the company is easily available from numerous online sources.

The business model of LC is to match borrowers and investors. Naturally, people prefer receiving money to parting with it. An important limiting factor to LC’s growth is the ability to attract investors, build a trusting relationship where, as a minimum first step, investors trust LC to provide accurate, transparent and reliable information of the borrowers. For this purpose, LC decided not only to provide extensive information about potential borrowers’ profile, but also historical information about past borrowers’ performance publicly available to all registered investors. This dataset is the subject of this report. It was downloaded from the Kaggle data science website1.

The size of the dataset is rich enough that it could be used to answer many different questions. The questions could be: given a borrower profile, is his/her rating appropriate in terms of risk of default? And if a default occurs, what is the expected recovery? The summary question is: given a borrower profile, is the risk/reward balance appropriate to commit funds? In the course of preparing this report, we considered those and eventually focused on researching the first question lenders seek to answer: What is the probability of a borrower no repaying principal with interest in full? Following Chapter 5 of (Peng 2012), formulating this question will guide our analysis.

We understand that LC allows investment of very granular amounts. Therefore, even an individual investor can diversify his/her loan and risk portfolio. It is not necessary to ‘gamble’ funds on a single borrower. This is exactly what institutional investors achieve through syndication (although on a very different scale, typically $10-25mln for a medium-size bank). An investor can diversify his/her portfolio of loans across many borrowers so that probabilities of default can be considered on a statistical basis.

This report is organised as follows:

  • We first introduce some financial terms and concepts that will be used in the rest of the report. This can be skipped if you studied finance. Those are just basic concepts.

  • The second section introduces the dataset and uses a number of visualisations to illustrate some important aspects. We also provide the calculation of some financial amounts introduced in the first section.

  • The third section described the model we used to assess the probability of a loan defaulting based on the information provided by an applicant. This is just one exapmle of the sort of questions one could try to answer based on the dataset.

  • Finally, we assess our results in the conclusion. We provided numerous avenues to improve on this project.

  • A post-mortem section is provided for interest purposes only.

IMPORTANT WARNING:

The calculations presented here are simplistic, although they bear some resemblance to what financial institutions (FIs) do. The literature on credit assessment and pricing is very rich and very complex. Finding the optimal capital allocation to particular risks while at the same time satisfying internal risk policies and regulatory requirements is a problem that financial institutions have yet to solve in full. Investing in a loan is not only a matter of assessing the risk of a particular borrower, but also assessing systemic risks (which exist across all borrowers), risks associated with funding the loan (interest, currency and liquidity markets), each requiring a risk assessment and pricing.

In other words, nobody would, let alone should, make any investment decision based on the calculations below.

References

Peng, Roger. 2012. Exploratory Data Analysis with R. Lulu. com.