Company
11 Dec 2023
Transaction enrichment is a solved problem, but how much does it cost
Author
Naré Vardanyan
Co-Founder and CEO
Last week was a particularly good one for transaction enrichment companies.
When we started Ntropy, this was not viewed as a category of its own. It was considered a feature , a part of open banking, or internal stacks of lenders, PFM-s, payment businesses and fraud vendors. This has changed.
Two enrichment companies announced fundraising rounds, many more are getting started.
Venture capitalists too are recognizing a significant opportunity in the market for high-fidelity transaction data. This data can be used to automated underwriting both for consumers and businesses, to improve fraud detection for banks and card issuers, or to create more personalized and engaging banking products. Enhanced customer engagement can lead to increased deposit protection.
Back-office automation and building for the CFO suite also requires great quality transaction enrichment.
This data is core to delivering high quality end financial products. It is the horizontal layer across use cases that has traditionally been fragmented and unresolved, resulting in higher costs of risk.
Ntropy’s perspective: the Original player
Legacy approaches to solve this problem are still popular. They include either internal rule-based workarounds or using separate fraud, underwriting and other vendors, who wrap end-to-end solutions around better transaction enrichment.
We started Ntropy because transactions are the true system of record for finance, and the unified transaction enrichment layer was missing.
It does not make sense to run your own data centre or to write C++ quote to query a database. It does not make sense to spend time and effort in building your own engine either. It is costly with diminishing returns at scale.
Given the expanding aperture of upstream applications on top of this data with recent advancements in AI, it is becoming more important than it has ever been.
If you have decided to pick a provider vs build this internally, the question is how do you go about evaluating a prospective vendor?
Evaluating Transaction Enrichment Vendors: What Matters?
If you talk to any company looking for a solution and ask them what they care about, accuracy will come up as the number one requirement. Accuracy is often touted by vendors, with impressive percentages like 97% or even 99%. Yet, these numbers are misleading. The accuracy of output and the way it is measured depends on the dataset and how representative it is compared to your own data distribution.
Often you have to conduct your own benchmarks and look at the numbers.
When you publicly announce you enrich merchants with 99 percent accuracy, are these the merchants in your database that you have acquired ? How many times do you not return anything, how many of the hard cases have you not seen before?
The Long Tail Challenge
Both for fraud and underwriting, out of distribution data and being able to capture the long tail is crucial.
The problem with the long tail is that things are constantly changing and you cannot capture everything with a static system or rules . Even having the most comprehensive merchant databases in the world, your coverage will lag.
The alternative is using models and stochastic systems.
They have trade-offs too, however. They make mistakes and the answers are not always consistent.
Solely relying on such a system is hard.
The ideal solution: a dual approach
To solve transaction enrichment, you need two equally important components: a reasoning and assessment engine (we use language models for this) and a knowledge base that is constantly updated with deterministic entities information.
The combination of the two is a long-term solution that you can rely on regardless of the use-case and with the current advancements in language models, regardless of the data distribution as well.
At Ntropy we have been working on this for a while now and have best in class language models and one of the most comprehensive merchant databases that is constantly updated with new entities we are capturing with our models and caching.
Latency: the need for speed
The second most important factor in evaluating a vendor besides accuracy is latency. Many high value use-cases require quasi real-time performance. If we are talking about fraud detection and sitting in card authorization flows, you have sub 100ms to return a result before a card transaction is approved or declined.
Unless you have a very comprehensive internal data network and a cache of all possible merchants and transactions, delivering a result so quickly can prove to be tricky.
You certainly cannot do this by using a billion plus parameter model for inference or even web crawlers. This is a classic example where the value of delivering an answer is speed, and deploying a very large reasoning engine does not make sense.
You need to rely on a knowledge base and lookups and guarantee the knowledge base is the most up-to-date possible and keeps expanding in order to capture the harder, out of distribution cases.
Models in this scenario can act as oracles to keep your knowledge in shape and continue growing it.
Cost considerations
Finally, a key factor to pick a vendor is price. The value per understanding of a single transaction vs the price you are paying for it need to match. The larger the models, the slower and more costly they are.
If you are commanding large volumes, the price per transaction can significantly affect your bottom line. For a vendor, making margins on it while being accurate is very hard too.
The Ntropy Journey
Given the above factors, in 2021 we set out to build bespoke language models and merchant databases to solve transaction enrichment. This was not a core competence of fintech teams and banks, but everyone needed a solution. Moreover they could benefit from a horizontal solution, a financial data network that was constantly improving feeding off the data of all the participants.
We were very opinionated from the start. Despite the widespread intuition to build for one vertical or one use case, our vision was to create a financial brain that is precise, yet can guarantee coverage, meaning the hardest and oddest transactions and merchants will be resolved.
In order to do this, we have been working across many different data sources, geographies and even languages, set to allow our models to generalize.
As we were doing this, bigger and better reasoning engines came to market and we went back to the drawing board. This was a massive transition.
The Future of transaction enrichment
GPT4 with the right prompts is a reasoning engine that can solve transaction enrichment for nearly all cases. However, it is very slow and can get very expensive. Processing Visa’s or Stripe’s volumes with GPT4 is not feasible in the near future.
In the last 6-7 months, we have been working hard to find a solution. We did.
Today, with Ntropy you can have a reasoning engine as sizable as GPT4 or even bigger, yet guarantee the speed and costs of a look-up table. This means turning inference costs into storage costs. The biggest caveat is data. The volumes, the coverage, the diversity of the data you have access to.
This optimization completely changes the game for transaction enrichment, but also other tasks and use-cases that LLM-s are a fit for. We are super excited to see our customers build products and services around it.
I will be sharing more soon, but for now I am excited to see and admit that transaction enrichment is a solved problem and it is one that is going to deliver real value across many upstream applications in finance.