A Few Changes in One Algorithm Can Make Lending a Lot Less Racist

Kasey Matthews
Momentum
Published in
5 min readAug 5, 2020
Photo: 10'000 Hours/Getty Images

There’s a bug in the software of the U.S. credit economy, one that is possibly holding back people of color from building real wealth. The bug is an algorithm called BISG, or Bayesian Improved Surname Geocoding, which many banks and credit unions use in their fair lending analysis to estimate a borrower’s race based on their last name and location.

By law, lenders have to analyze their portfolios regularly to ensure they’re not discriminating based on race, gender, and a range of other protected classes. But only mortgage lenders are allowed to gather borrower race data. Everyone else has to rely on techniques like BISG that estimate race based on non-race data.

The problem with BISG is that it’s often wrong, which creates an untold impact on millions of Americans. Using a crooked yardstick to assess the racial disparity of loan approvals can provide lenders with a false confidence that their credit models are much fairer than they are. Not understanding where real disparity occurs makes it impossible for the lender to identify problematic lending policies and for the regulator to assess harm. As a person of color and a data scientist, I felt doubly compelled to do something about it and build a better algorithm.

I’m a perfect example of how BISG gets it wrong. I live in Glendale, California, one of the whitest cities in America. Only 0.3% of Glendale residents are Black. With a name like Kasey Matthews and a home in Glendale, guess who thinks I’m white? BISG. When I ran my name and zip code through BISG, it guessed that I had a 90% chance of being white. That means that all my good credit behavior, or that of anyone like me living anywhere else like Glendale, would get mischaracterized as that of a white person by a lot of fair lending analysis. This might not seem so problematic on its face, but because I’m counted as white, if my loan application gets declined, that decline is counted as a white decline, not a Black decline.

A 2014 Charles River Associates auto lending study, sponsored in part by some lending institutions, found that BISG correctly identified African American borrowers a mere 24% of the time at an 80% confidence threshold. Hispanic and Asian borrowers were correctly identified 77% and 60% of the time, respectively. At a 50% confidence threshold, BISG was no better than a coin flip for Black borrowers. (See page 55 in the report.) The Consumer Financial Protection Bureau (CFPB), using a different set of loans, found that BISG correctly identified only 39% of African Americans. “These differences highlight just how wide-ranging the error rates can be based on the populations,” said the CRA report’s authors. The report goes on to argue that because of the inaccuracy of the BISG analysis method, harm identified in fair lending analysis may be overstated. It is no surprise that one U.S. House lawmaker called BISG “junk science.”

To be fair, BISG wasn’t intended for use in fair lending analysis. It was developed by the Rand Corporation in 2000 to help determine whether minorities were receiving health care at the same rate as whites. In small geographic segments, especially in racially or ethnically homogenous areas, Rand believed BISG was right nine out of 10 times in identifying people as African American. Eventually, the CFPB adopted it for judging lender outcomes, and has levied millions of dollars of fines for racial bias, many of which were based on BISG.

Statisticians have tried to improve BISG. A variant called BIFSG added first names to the mix. Another method predicts ethnicity based on a name’s character sequence. Neither moves the needle much on accuracy. Consumers deserve better, as do the lenders and regulators who make the decisions that affect the lives of millions of borrowers.

Earlier this year, the Zest data science team built a new neural network called Race Predictor that, in a test on Florida voter data, outperforms BISG by 60%, correctly identifying African Americans 74% of the time, compared with 47% for BISG. Race Predictor correctly identified Hispanics 87% of the time, compared to 77% for BISG. While there is plenty of work to be done to make it better, and we welcome help from partners, Race Predictor is showing promising results.

Race Predictor’s neural network correctly identified African Americans 74% of the time, compared with 47% for BISG.

Race Predictor is also better than BISG at delivering true positives with high confidence that holds across more diverse groups (see chart below). By contrast, BISG is almost never certain about a person’s race unless they’re white.

Race Predictor is a natural extension of BISG; it uses name and address information and adds other race-correlated data such as the U.S. Department of Agriculture’s atlas of community food access and Environmental Protection Agency stats on neighborhood walkability. To make best use of the additional data, we’ve replaced the simple Bayesian statistical method used to create BISG with modern machine learning methods that Zest and sophisticated U.S. lenders employ for credit underwriting. These techniques are proven in credit underwriting where they help our customers become more profitable. With Race Predictor, we are applying these advanced methods to provide benefits to those who are underserved.

Race Predictor was trained and validated on roughly a million people from several Florida counties via the Florida voter database, one of the largest publicly available sources of demographic data that includes name, address, and ethnicity. A model trained only on this subset may not generalize to a national population, but it’s a solid place to start. We’d love to use national race data, but the U.S. Census doesn’t make that data publicly available.

We plan to update Race Predictor later this year to improve its accuracy with more data sources, geographies, and new machine learning techniques.

If you would like to contribute to the project with data or engineering help, by all means drop us a line at abetterway@zest.ai. With better math and more data we can do better than we have in the past and address important issues of equity in access to financial services.

Additional references:

Sood, Gaurav, 2017, “Florida Voter Registration Data,” https://doi.org/10.7910/DVN/UBIG3F, Harvard Dataverse, V1

--

--