Alternative lending

Blog | On the use (and misuse) of Gini Coefficients in Credit Scoring: Comparing Ginis

By: Carlos Del Carpio, Director of Risk and Analytics, LenddoEFL

This is part 2 of a series of blog posts about Ginis in Credit Scoring. To see the part 1, follow this link.

image5.jpg

What is an AUC?

AUC stands for “Area Under the (ROC) Curve”. From a statistical perspective, it measures the probability that a good client chosen randomly has a score higher than a bad client chosen randomly. In that sense, AUC is a statistical measure widely used in many industries and fields across academia to compare the predictive power of two or more different statistical classification models over the exact same data sample [1].

How is AUC used in Credit Scoring?

In the particular case of Credit Scoring, AUCs are useful for example in the model development process, when there are several candidate models built over the same training data and they need to be compared. Another typical use is at the time of introducing a new credit score, to compare a challenger against an incumbent score over the same sample of data under a champion challenger framework.

How does AUC relate to Gini Coefficient?

The Gini Coefficient is a direct conversion from AUC through a simple formula: Gini = (AUC x 2) -1. They measure exactly the same. And it is possible to go directly from one measure to the other, back and forth. The only reason to use Gini over AUC is the improvement in the scale’s interpretability: while the scale of a good predicting model AUC goes from 0.5 to 1, the scale in the case of Gini goes from 0 to 1. However, all the properties and restrictions of AUC still translate into Gini Coefficient, and this includes the need to compare two different AUC values over the exact same data sample to make any conclusion about their relative predictive power.

image3.png

 

What does this mean in practical terms?

Any direct comparison of the Gini Coefficients (or AUCs) of two different models over two different data samples will be misleading. For example: If a Bank A has a Credit Score with a Gini Coefficient of 30%, and Bank B has a Credit Score with a Gini Coefficient of 28%, it is not possible to make any conclusion about which is better or which is more predictive because they have been calculated over different data samples without accounting for the difference in absolute number of observations and the difference in proportion of good cases against bad cases. The only direct comparison possible is the one made about two scores side by side, over the exact same data sample.

Bottom-line: To affirm that a certain absolute level of AUC or Gini Coefficient is “good” or “bad” is meaningless. Such affirmation is only possible in relative terms, when comparing two or more different scores over the exact same data sample. Unfortunately this is often not well understood, which leads to the most frequent misuse of AUC and Gini Coefficients, such as direct, un-weighted comparisons of Gini values across different samples, different time periods, different products, different segments and even different financial institutions.

 

[1] Hanley JA, McNeil BJ. The meaning and use of the area under a Receiver Operating Characteristic (ROC) curve. Radiology, 1982, 143, 29-36.

Blog | Raising the Stakes on Psychometric Credit Scoring

An updated and expanded 2nd edition (first edition)

Why read this post?

Learn why high-stakes data is essential for building accurate credit-scoring models.

 

Introduction

Billions of people lack traditional credit histories, but every single person on the planet has attitudes, beliefs, and behaviors that can be used to predict creditworthiness. Quantifying these human traits is the focus of psychometrics, and the alternative data provided by this technique allows LenddoEFL to greatly expand financial inclusion in its mission to #include1billion.

But there is a catch: in order to build models that accurately predict default, applicants need to complete psychometric assessments in pursuit of actual financial products, a so-called “high-stakes” environment. This is because people answer psychometric questions differently when they have a chance to receive a loan (the high stakes) than they would in a hypothetical situation with no incentive (the low stakes).

Despite this fact, psychometric tools are frequently built using low-stakes data. For example, many companies develop psychometric credit scoring tools using volunteers. And many lenders want to validate psychometric credit scoring tools on their clients through back-testing: giving the application to existing clients and comparing scores to their repayment history, again a low-stakes setting.


These approaches are only valid if low-stakes data can be applied to the real world of high-stakes implementation, where access to finance is on the line for applicants. But it turns out that this is not the case. A recent study published by our co-founder Bailey Klinger and academic researchers proved that low-stakes testing has no predictive validity for building and validating psychometric credit scoring models in a real-world, high-stakes situation. The data below shows exactly how applicant responses shift as they move from one environment to another.

 

The Experiment

To test for differences between low- and high-stakes situations, LenddoEFL gathered psychometric data from two sets of micro-enterprise owners in the same east-African country. One group already had their loans (low-stakes) and another group completed a psychometric assessment as a part of the loan application process (high-stakes).

First, the low-stakes data. The figure below shows the frequency distribution for two of the most important ‘Big 5’ personality dimensions for entrepreneurs, Extraversion and Conscientiousness, as well as a leading integrity assessment[i].
 

image1.png


You can see that when the stakes are high, people are answering the same questions very differently. The distribution of scores on these three personality measures shifts significantly to the right. When something important is at stake, like being accepted or rejected for a loan, people answer differently.

How do these differences in low- vs. high-stakes data matter for credit scoring?

To see how these differences impact the predictive value of psychometric credit scoring, we can make two models[ii] to predict default: one uses responses from applicants that took the application in low stakes settings, and the other uses responses from applicants that were in high stakes settings. Then we can use a Gini Coefficient—which measures the ability of a model to successfully rank-order applicants’ riskiness and for which a higher coefficient is a metric of success in this—to compare each model’s ability to predict default for the opposing population as well as its own.[iii]

image2.png


These results clearly show that there is a significant change in the rank ordering when models built on low-stakes data are applied in high-stakes settings and vice versa.[iv] Importantly, we can see that a psychometric credit-scoring model can indeed achieve reasonable predictive power in a real-world, high-stakes setting. But, that is only when the model was built with high-stakes data.

Think about it like this: when the stakes are high, both less and more risky applicants change their answers. But, less risky applicants change their answers in a different way than riskier applicants. This difference is what is used to predict risk in psychometric credit scoring models: the difference between how low- and high-risk people answer in a high-stakes setting.

This also illustrates why we see that a model built on low-stakes data is ineffective in a real-world high-stakes implementation. In the low-stakes setting, the low- and high-risk people aren’t trying to change their answers, because they aren’t concerned with the outcome of the test. Once the stakes are high, however, this pattern changes.

 

Conclusions

Testing existing loan clients or volunteers has an obvious attraction: speed. That way you don’t have to bother new loan applicants with additional questions, and then wait for them to either repay or default on their loans before you have the data to make or validate a score, an approach that takes years.

Unfortunately, these results clearly show that this shortcut does not work. People change their answers when the stakes are high, so a model built on low-stakes data falls apart when used in the real-world. People answer optional surveys with less attention and less strategy than they do a high-stakes application, and therefore the only strong foundation to a predictive credit-scoring model is real high-stakes application data and subsequent loan repayment.

Consider an analogy: you can’t predict who is a good driver based on how they play a driving video game, where the outcome is not important. Conversely, someone who does well on a real-world driving test may not perform that well on a video game.  Whether it is driving skills or creditworthiness, you must predict the high-stakes context with high-stakes data.

 

TAKEAWAYS:

- Psychometric model accuracy is only guaranteed when you collect data in a high-stakes situation (i.e., a real loan application).

- Despite its speed, back-testing a model on existing clients in a low-stakes setting is risky because it might not tell you anything about how the model will work in a real implementation.

- If you want to buy a model from a provider, the first thing you should verify is what kind of data they used to make their model. Was it from a real-world high-stakes implementation similar to your own?

 


[i] These are indices from widely available commercial psychometrics providers. It is important to note that LenddoEFL no longer uses any of these assessments or dimensions in our assessment, nor any index measures of personality.

[ii] Stepwise logistic regression built on a random 80% of data, and tested on the remaining 20% hold-out sample. An equivalently-sized random sample was used from the other set (high-stakes data for the low-stakes model, and low-stake data for the high-stakes model) to remove any effects of sample size on gini.

[iii] Note that this exercise was restricted to those questions that were present in both the low- and high-stakes testing. It does not represent LenddoEFL’s full set of content and level of predictive power, it is only for purposes of comparing relative predictive power.

[iv] The results also show that using standard personality items, the absolute predictive power is lower in a high-stakes setting compared to a low-stakes setting. This is likely because of the ability to manipulate some items in a high-stakes setting makes them not useful within a high-stakes setting. This lesson has lead LenddoEFL to develop a large set of application content that is more resistant to manipulation and which has much higher predictive power in high-stakes models. This content forms the backbone of the current LenddoEFL psychometric assessment, all of which is built and tested exclusively with high-stakes data and subsequent loan repayment-default rather than back-testing.

 

CardRates.com | How LenddoEFL Uses Data and Personality Analyses to Increase Access to Financial Services in Emerging Economies

Credit is hugely important to people around the globe. You need it to obtain housing and higher education. You need it to start a business. You need it in case of emergencies and other unexpected expenses.

But in emerging economies, credit may not be accessible to many people. According to the World Bank’s 2017 Global Findex, 31% of the world’s population doesn’t have an account with a financial institution or a mobile money provider.

“We still have 1.7 billion people on the planet who don’t even have a basic bank account,” said Amie Vaccaro, Director of Marketing at LenddoEFL. “Only 11% of people around the world borrowed from a formal financial institution in the last year.”

Read full article

Screen Shot 2018-04-20 at 2.15.49 PM.png

Blog | Score Confidence: Boosting Predictive Power

image1.jpg

Note: This is a new and improved version of a popular post from last year.

Our unique platform has a big reason to live: we provide fast, affordable and convenient financial products for more than 1 billion people worldwide. And there is only one way to accomplish that: by facilitating more actionable, predictive, robust and transparent information to our clients to enable them to make the best possible lending decisions. However, data quality pose the most challenging problem we have faced along this journey as it threatens the predictive power we are delivering to our clients. Therefore, through the years we have developed and perfected a one-of-its-kind way to assess the quality of the data applicants are supplying: Score Confidence.

What exactly is Score Confidence?

Score Confidence is a tailored algorithm that scans and analyzes psychometric information gathered through LenddoEFL's Credit Assessment to generate a Green or Red flag which reflects how confident we are on our score’s ability to represent an applicant’s risk profile:

  • The result will be Green if LenddoEFL is confident in the data quality such that we will generate and share a score based on it.
  • Conversely, the outcome will be Red when LenddoEFL’s confidence in the gathered information has been undermined.

What does Score Confidence measure?

Once the applicant has taken our psychometric assessment, we put the data through our Score Confidence algorithm to find out whether we can be confident in a score generated using this data or not. We will return a Green Score Confidence flag if we believe the score accurately predicts risk, and also be transparent about the reasons behind a Red Score Confidence flag to empower our partners with increased visibility and actionable information.

LenddoEFL's Score Confidence system is comprised of five Confidence Indicators of key behaviors, each generated from a combination of different data sources. If we identify evidence of any of the following behaviours, the assessment will be rated as Red and no risk score will be returned in order to protect our partners:

  • Independence – the assessment has not been completed independently, and LenddoEFL detects attempts to improve one’s responses with either the help of a third party or other supporting resources.
  • Effort – the applicant has not put forth adequate effort and attention in completing the assessment.
  • Completion – the applicant has not responded to a sufficient portion of the timed elements of the assessment.
  • Scoring error – a connection issue or system error occurred and LenddoEFL is unable to generate a score.

What information feeds Score Confidence?

Our data quality indicators are constantly reviewed and updated and, over the years, we have added new and different data sources to our Score Confidence algorithms:

  • Browser and device metadata surrounding the completion of the application
  • User interaction information with LenddoEFL’s behavioural modules
  • Self-reported demographic data

Our Score Confidence system flexibly combines all the available data in order to return a Red or Green status for each application.

How does Score Confidence help our partners make the best possible lending decisions?

To boost the predictive power we can deliver for our clients, LenddoEFL does not share a LenddoEFL score for applicants with a Red Score Confidence flag as we have learned that Red applications tend to have very limited predictive power whereas data coming from Green flagged assessments can effectively sort risk amongst applicants. Therefore, not lending against a score for Red flagged applications boosts the predictive benefit for our clients.

Yahoo Japan | Can Japanese banks use big data with "AI loan"? (日本の銀行は「AI融資」でビッグデータを活用できるか)

Attempts to calculate the creditworthiness of individuals by AI (artificial intelligence) and to finance using it are expanding. This is called "AI score lending". 

 The meaning of AI doing loan screening, which is one of the most important tasks of banks, is quite large. 

 However, the question is whether Japanese financial institutions can handle big data. If it can not do it, it will repeat the failure of the past score lending. 

Singapore's Lenddo is a service in emerging countries such as India, Vietnam, Indonesia, which have never had a history of credit. 

Read full article

Benzinga | Here Are The Benzinga Global Fintech Award Finalists For The Best Under-Banked Or Emerging Market Solution

The finalists for the Best Under-banked or Emerging Market Solution category are:

LenddoEFL
CEO: Richard Eldridge
Description: LenddoEFL's mission is to provide 1 billion people access to powerful financial products at a lower cost, faster and more conveniently.

See full list of finalists

Welcoming our New Behavioral Science Manager

In this photo, Jonathan demonstrates cultural differences in height during a field visit with loan applicants in Veracruz, Mexico.

In this photo, Jonathan demonstrates cultural differences in height during a field visit with loan applicants in Veracruz, Mexico.

Since our merger, we have welcomed a number of incredible new colleagues onto the LenddoEFL team. Jonathan Winkle joins us in our Boston office as our new Behavioral Science Manager. We cornered him to learn more.

Tell us about your background?

In undergrad I majored in psychology, where I developed a passion for researching the brain and behavior. To gain more experience after college, I worked in a systems neuroscience lab at MIT studying visual attention. Eventually I found my way to Duke where I earned my PhD in cognitive neuroscience. My dissertation focused on the behavioral economics of dietary choice, investigating how the mind is affected by “nudges” that can bias people towards healthy (or unhealthy) eating habits.

What brought you to LenddoEFL?

Studying behavior has always excited me because it is the ultimate endgame of our brains’ hard work, yet academic research on the topic can often be too disconnected from real-world problems. I found myself wanting to make more of an impact on society, and in this role I can leverage my experience to quickly and directly improve people’s lives around the world. As the Behavioral Science Manager for LenddoEFL, I can test a new hypothesis and apply that knowledge globally in a matter of weeks. And the better I do my job, the more people I can help get access to life-changing financial services.

What are your plans as Behavioral Science Manager?

My primary goal is to drive feature engineering. Features are the observations we collect about individuals to predict credit risk, and feature engineering is the process of discovering and creating new features to make our algorithms work better. For example, how honest a person is might be predictive of loan default, but we first need to quantify honesty as a feature to use it in a predictive model. As new features make our models more predictive and more powerful, our financial institution clients all over the world will gain a better understanding of their under-banked loan applicants.

If I am successful, we will be better at predicting if someone will repay their loans, thereby allowing our clients to make the best, most informed decisions possible. No pressure.

Across data sources, we look for ways to profile a person’s character, trying to understand how traits like honesty or conscientiousness relate to credit risk. This is a hard, but extremely important challenge.

LenddoEFL deals with both psychometric/behavioral and digital data sources. How do those differ and how do you think about each?

On the psychometric side, we engineer the form our data will take from the outset, then extract it by inserting new content (e.g., survey questions or psychometric games) into our simple, interactive assessment. We can be more hypothesis-driven when it comes to designing features in this realm.

On the digital side, we work with large, unstructured data sources where we necessarily have to be more exploratory and let the data do the talking.

Will you be working with our research advisors?

Absolutely! I am looking forward to working with leading researchers like Peter Belmi to push the envelope of our own research while also sharing the insights gained from our unique dataset with those in the field of behavioral economics. We will also be inviting more researchers to collaborate on our work.

Enough about work, what do you do for fun?

I like to rock climb, play Go, hang out with my dog Clementine (pic below), and try out new recipes in the kitchen.

image2.jpg

What’s a fun fact about you?

I have a tattoo of Phineas Gage, a famous figure in the history of psychology and neuroscience. Gage was a railroad worker in 1848 that lost the left pre-frontal cortex of his brain when an accidental explosion sent a 3 foot iron rod rocketing through his head. Miraculously, he survived and was even able to walk himself to a doctor despite the 11⁄4 inch hole running behind his left cheek and out the top of his skull. He lived for 11 years after this event, but experienced marked changes in his personality that have been studied ever since. The story in itself is fascinating, and of particular interest to me is how Gage’s misfortune shaped theories of the mind for more than a century after the accident.

image1.jpg

 

Look out for a future post from Jonathan about his field work in Mexico and learnings about group dynamics.

Medici | What Happens at the Convergence of Machine Intelligence and Online Lending

Credit scoring and approval rates changed substantially with the arrival of alternative lenders, mainly due to the adoption of new practices in collecting and analyzing potential borrower data. Alternative data has played its role in expanding horizons for financial institutions and for creating an opportunity to enter the financial sector fir technology startups and data-rich international companies.

While social media, for example, as a source of data for creditworthiness assessment is still at a nascent stage, certain startups are already claiming to have incorporated information from social networks into their frameworks. In the quest to reinvent the way to assess consumer-related risk (as well as extend credit to unscored and questionable), startups were found more imaginative than traditional institutions.

Alternative data requires alternative approach to data analytics, which wide adoption of machine learning and artificial intelligence brought.

Read full article

Malaysian Business Online | CTOS and LenddoEFL partner up to boost Financial Inclusion in Malaysia

CTOS Data Systems Sdn Bhd, Malaysia’s largest credit reporting agency, has entered into a partnership with LenddoEFL.

CTOS Data Systems Sdn Bhd, Malaysia’s largest credit reporting agency, has entered into a partnership with LenddoEFL.

Karangkraf | Beri peluang rakyat akses perkhidmatan kewangan

AGENSI pelaporan kredit terbesar Malaysia, CTOS Data Systems Sdn Bhd (CTOS), menjalin kerjasama dengan LenddoEFL bagi memperluaskan perangkuman kewangan pengguna Malaysia yang kurang atau tidak mempunyai sejarah kredit melalui ‘CTOS Non-Traditional Data Score’.

Ketua Pegawai Eksekutif Kumpulan CTOS Holdings Sdn Bhd, Dennis Martin berkata, walaupun markah kredit ramalan tentang tingkah laku pembayaran telah meningkat tahun demi tahun, namun sekumpulan besar peminjam yang berpotensi baik ketika ini dinafikan akses kepada kredit disebabkan kurangnya sejarah kredit.

“Disebabkan pemberian pinjaman pengguna lazimnya bergantung kepada skor kredit, individu ini mendapati diri mereka terpinggir daripada ekosistem kredit dan juga sukar menambah baik markah kredit mereka.

“Dengan memanfaatkan sepenuhnya data tingkah laku dan data digital yang diizinkan penggunaannya oleh pengguna, CTOS dan  LenddoEFL akan melancarkan platform keputusan kredit universal yang mampu menaksir kebolehpercayaan kredit mana-mana rakyat Malaysia, baik yang ada sejarah kredit mahupun kurang sejarah kredit,” katanya dalam kenyataan media.

Menurut Dennis, kini banyak individu yang dahulunya kurang dilayan oleh institusi kredit atas alasan risiko kredit tradisional mereka, akan menikmati peluang untuk akses kredit. 

Read full article.

Finance Digital Africa | Can big data shape financial services in East Africa?

Psychometric big data—including online quizzes to judge character or personality traits and analysis of Facebook “likes”—is garnering increased attention. Suppliers of psychometric data or psychometric tools, such as EFL, believe not only that their data and analytics are predictive but also that they have a key advantage in their applicability to everyone, even clients with limited credit history (“thin-file” clients), as a starting point. When layered with other big and traditional data sources (e.g., social media, mobile phone, bureau data, bank historical data), proponents expect psychometrics to become even more powerful. Indeed, Equity Bank conducted an experiment with EFL’s psychometric scoring model and found it both predictive and useful; they plan to integrate it into applicable models across their regional subsidiaries.

 

 Moreover, Juhudi Kilimo decided to partner with EFL in order to evaluate character as part of their risk assessment. This was previously carried out by loan officers, but they believed the EFL approach would be more objective.

Read full article.

World Bank | Using a PhD in development economics outside of academia: interviews with Alan de Brauw and Bailey Klinger

Today's interviews are with Alan de Brauw, a Senior Research Fellow in the Markets, Trade, and Institutions Division at the International Food Policy Research Institute; and Bailey Klinger, the founder and (until recently) CEO of the Entrepreneurial Finance Lab

Read full interview with Bailey Klinger.