It relies on the BNP Paribas Personal Finance graph database to track fraud
BNP Paribas Personal Finance implemented DBMS on Neo4j graphs to optimize consumer credit fraud detection.
AdvertisingBNP Paribas Personal Finance is a subsidiary of the BNP Paribas group specializing in consumer credit and offering split payment services, especially on e-commerce sites. To improve the detection of fake files in these services, the company tested using the Neo4j graph database, which technology was later released into production after providing conclusive results. A preliminary assessment of this project was presented at the Big Data & IA 2022 show.
Split payment services, which allow the payment to be spread over three or four parts, are often the target of fraud networks. They don’t just reuse information (names, phone or credit card numbers, etc.) from one file to another. Mehdi Barchouchi, innovation manager for data and tools at BNP Paribas Personal Finance’s French risk department, explains that they are changing them, meaning traditional approaches to blacklisting no longer work. In order to improve the detection of fraudulent files, it is sometimes necessary to be able to link files that are not shared information. this is in addition to the basic requirement: the calculation had to be done in real time to be able to respond immediately to the client who sent his file.
Perfect use case
In this kind of processing, the performance of conventional relational databases is insufficient. Indeed, in order to detect the relationship between folders, it is necessary to multiply the joins, a particularly expensive operation. Mehdi Barchouchi says that the problem lies in the depth of networks. BNP Paribas Personal Finance therefore decided to test the database in graphs, as this technology is well suited to a structure where a lot of data is interconnected. For Douard Tabary, head of innovation and data science at BNP Paribas Personal Finance scoring center, this is the perfect use case.
The Neo4j solution was then adopted and piloted on a local server in 2020 with a reduced data set. The team first creates a data model in graphs from tabular data, then incrementally refines it to reach the target model, especially using machine learning algorithms. Finally, it builds indicators based on the value of forecasts. “We have obtained a very efficient model: by applying it to a small part of the population, we have covered almost all fraud networks,” said Douard Tabari.
douard Tabary, head of innovation and data science at BNP Paribas Personal Finance scoring center: We can track files without any aggregate information, but in a way that connects them.
AdvertisingA complex project due to its real-time dimension
The next step: industrialization of the model Work began in early 2021 and was completed in early 2022 with live transmission. At this stage, optimization of the algorithm, especially for real time, was continued. But most of the time was devoted above all to the design of the relevant architecture that housed the interior. In the course, we built a system to call the Neo4j infrastructure in real time, but this system still includes a transactional part to preserve our ability to learn data to improve the algorithm, Douard Tabary pointed out.
Data now flows directly into the database as graphs and can be instantly compared to all past queries with responses within milliseconds. Douard Tabary explains how we can track two files without any common information but with a path connecting them. Once the clusters are identified, the team can look for signs of potential fraud, particularly by using Neo4j’s similarity relationships. The goal is to have as few false positives as possible, but in order to respond to a customer whose file will be rejected, it is necessary to understand the path that led the file to a high false risk score. We have an obligation to explain the model and understand the risk markers”, emphasizes Mehdi Barchouchi. Based on the graph, we know how to track and capture the context. Data fingerprinting provides a special context as we obtain more and more precise patterns. Given the retention times of legitimate data, we can explain fraud prediction by looking at the neighborhood and learning what drives the prediction. The team also monitors the fairness of the model to avoid discrimination.
A model to develop
Other entities of the BNP Paribas group use Neo4j for incident resolution, especially IT. However, the use of such technology is a first for the BNP Paribas Personal Finance team, which consists of two business experts, two data scientists and a small team of developers and administrators. In this project, this cross-functional team, especially involving the IT and risk department, benefited from the editor’s support to write the model and feed it with data. We are used to table formats and we should learn, says Mehdi Barchouchi. The challenge today is to explore other use cases and expand the knowledge base to reach other populations.
The team has already learned some lessons from this experience. According to Mehdi Barchouchi, to engage in such an approach, it is important to know your data well and have good examples of network fraud cases that can be found graphically. He recommends spending time on it both during the exploration phase and after production begins. This is just the beginning of the life of the project. “We need to be able to improve the model to react to the activity of fraudsters,” insists the innovation manager of data and tools at the French risk department. Not forgetting, he adds, to measure performance through indicators.
article written by
Aurlie ChandezCIO Deputy Editor-in-Chief
Follow the author on Linked In,
Share this article