Entity Resolution: A Key to Success for AI in Anti-Financial Crime
Michael Shearer, Chief Solution Officer at Hawk, spent several years at HSBC, equipping the bank with advanced systems, tools, and data to manage financial crime risk. His experience of using entity resolution led him to write a book, Hands-On Entity Resolution: A Practical Guide to Data Matching with Python, which was published earlier this year by O’Reilly.
We asked Michael to tell us about why he’d written a book on entity resolution and its importance to anti-financial crime (AFC) programs.
Michael, you’re well known in the banking world for your work on implementing AI for anti-financial crime at HSBC. How did you end up writing a book about entity resolution?
In short, because entity resolution is a critical enabler in the effective detection and investigation of financial crime, and in particular the use of AI.
The truth is, a lot of AFC work still relies on people looking at spreadsheets and getting their highlighter out, looking for patterns. AI has huge potential to reduce that manual effort and help detect financial crime more effectively.
But both AI and human beings will struggle if they’re trying to piece together different data sources that don't join up. For example, if you've got a name spelt one way in one data source and a name spelt slightly differently for the same person in another data source, then it’s much harder to get full picture and accurately detect risk.
Entity resolution is part of the solution to that, and I got really interested in it. I decided to write a book because no one else had produced anything on how to tackle entity resolution in practice. I wanted to make it accessible and give people a head-start with it.
Why is entity resolution so important for AI implementation?
AI can only really reach its full potential to detect financial crime more effectively if it has a full picture of the entities that it's considering.
What stops AI getting that full picture is that when you try to join your data up, the same individuals and entities aren’t tagged with the same key. So you need entity resolution to unlock the power of AI and to allow it to work.
Which financial institutions need to consider entity resolution?
It can be a particularly big issue for any institution with a long history. A bank might have data coming from different systems that have evolved over time, or maybe they've grown by acquisition. And they want to join those systems up, but they can't. That means they might have Michael Shearer in their system twice, three times, even more, but they don't know it's the same person.
The second issue is when banks want to use other data sources — company reference information, watch lists, grey lists, sanctions lists, and so on. The only way to make truly effective use of that information and make sure you're not overlooking something is to join the data sources together and run your detection process on the joined-up data.
In short, it’s an issue for every institution in some form or other. But the bigger you are, the worse it gets because you are more likely to have the same customer multiple times in your data. You really don't want a situation where one part of your business is saying “this customer is fine” and another part of your business is saying “this customer is really not fine.”
What is the risk of a financial institution not factoring entity resolution into their thinking?
The risk is ultimately that you fail to spot a financial crime connection that you’d have flagged if you’d realized that several entities were the same person. Or the reverse is true; you flag something as suspicious when actually, if you knew the full context of that entity and could join up the data, you'd see a good reason for their behavior and you’d know it’s not suspicious. Which means you’re less efficient and you waste money on investigating false positives.
What are the red flags that you might only see with entity resolution?
As an example, you might see a number of different entities using the same phone number. Criminals manufacture synthetic profiles of people to hold bank accounts so they can move money around or get money out of the system. The problem for them is that it’s expensive and difficult and time-consuming to create and maintain these profiles. So they make it easier by reusing numbers or addresses several times.
Entity resolution helps you spot that reuse. How likely is it that 10 people with different names have the same phone number? There are 10 different entities but they're sharing an attribute, and entity resolution helps to flag this.
Or you might have 100 different companies all registered at the same address.
You know you might get a few companies based in the same building but 100? It’s more likely being used as a shell address. Again, entity resolution helps you find that.
How hard is entity resolution?
It's a lot harder than it looks. You need to be confident that you're actually talking about the same entity. Two names might be similar but are they the same individual? You could have family members who have the same first name and last name, or it could be that a name like Sarah is spelt differently. It’s hard to be absolutely sure that two individuals are the same person. As the number of individuals and entities grows the challenge of comparing all the names with each other grows exponentially, making entity resolution difficult to scale.
Can you do AI without entity resolution?
You can. But there's a general rule that AI performs better with richer data. Without the entity resolution piece, you may have a more limited view, you'll get more false positives, you'll miss more genuine risk. But yes, you can do it.
How can financial institutions get started with entity resolution?
My advice is to start small. There's likely a lot of low hanging fruit that you'll find very easily just by being able to identify shared addresses, shared phone numbers etc. There are vendors that can help you – Hawk being one of them, with our Entity Risk Detection solution.
Finally, how did the actual book come about?
I hadn’t been able to find any good, accessible books on the subject of entity resolution. I'd read a few academic books, but they didn’t provide a guide to how it can be actually done in practice.
I’d previously done a bit of reviewing for O'Reilly, the publisher, so I thought what the heck, nobody's done this, there's a gap in the market for it, so why don't I do it? Why don't I help people understand how to do this in practice? That’s why it’s a “hands-on” guide—it's a set of exercises that take you through how it works.
What was the best moment of the book writing process for you? And what was the hardest part?
The hardest part was actually putting pen to paper and getting started. And the editing process wasn’t something I was used to – the back and forth of working with an editor was new to me. But it was fun and stimulating and I learned a lot.
The best part was when I initially pitched the idea to O'Reilly and they said yes. I was entirely expecting them to say “Who are you? Nobody's going to buy this.” I was thrilled that they said yes; thrilled and terrified because at that point I hadn't written anything. And then of course, there’s the moment when you actually open the box to find the physical books with your name on. It’s a good feeling.
Win a free copy!
We have five copies of Hands-On Entity Resolution: A Practical Guide to Data Matching with Python to give away - enter the draw here.
Find out more about Hawk: